InvMem is a collection of short notes on mathematics and programming.

#### Contents

#### N1 / Kepler-style coincidences / #Math / 2014.09.23

Take a Kepler triangle - a right triangle with sides 1, √φ, φ. Then the length of its circumscribed circle nearly matches the perimeter of a square with a side equal to its middle edge. This yields an approximation τ ≈ 8 / √φ. Here are some coincidences having a similar form:

Constant | Approximation | Relative Error |
---|---|---|

τ | 12 / 5 φ^{2} | 1 / 65270 |

τ | 9 / 2 ^{3}√e | 1 / 2144 |

τ | (11 / 7)^{3} φ | 1 / 1409 |

τ | 8 / √φ | 1 / 1043 |

e | ^{3}√10 / 7 S | 1 / 3770 |

e | 7 / 4 √S | 1 / 3305 |

e | 5 / Ln τ | 1 / 1210 |

e | 9 / 8 S | 1 / 1186 |

γ | 1 / √3 | 1 / 4289 |

P | √8 / 11 S | 1 / 3841 |

P | 9 / 11 φ | 1 / 1519 |

Notation: | |

τ | Circle constant |

e | Base of the natural logarithm |

φ | Golden ratio |

S | Silver ratio |

P | Plastic number |

γ | Euler-Mascheroni constant |

#### N2 / Note on means / #Math / 2015.02.10

Consider a family of means of two numbers a, b:

M(a, b, p) = ∫_{a}^{b}x^{p + 1} dx / ∫_{a}^{b}x^{p} dx

In other words, M(a, b, p) is the mean of a distribution with density ∝ x^{p} in the interval [a, b]. Many well known means belong to this family:

p | Expression | Name |
---|---|---|

1 | (2 / 3) (a + b - a b / (a + b)) | ??? |

0 | (a + b) / 2 | Arithmetic |

-1/2 | (a + √a b + b) / 3 | Heronian |

-1 | (b - a) / (Ln b - Ln a) | Logarithmic |

-3/2 | √a b | Geometric |

-2 | a b (Ln b - Ln a) / (b - a) | ??? |

-3 | 2 a b / (a + b) | Harmonic |

#### N3 / Generalized Motzkin numbers / #Math / 2015.02.16

Number of strings of length N consisting of symbols from the alphabet of size b and correctly placed brackets is

⸤N / 2⸥ | |

∑ | (N | 2 i) b^{N - 2 i} C_{i} , |

i = 0 |

where (N | 2 i) is a binomial coefficient and C_{i} is the ith Catalan number. This result is easy to interpret: C_{i} is the number of possible configurations of 2 i brackets, (N | 2 i) is the number of ways to place a bracket configuration among the N symbols, and b^{N - 2 i} is the number of possible configurations of the non-bracket symbols.

b | OEIS | First terms starting with N = 0 | |
---|---|---|---|

0 | A126120 | 1, 0, 2, 0, 5, 0, 14, 0, 42, ... | Catalan numbers interleaved with 0s |

1 | A001006 | 1, 1, 2, 4, 9, 21, 51, 127, 323, ... | Motzkin numbers |

2 | A000108 | 1, 2, 5, 14, 42, 132, 429, 1430, 4862, ... | Catalan numbers |

3 | A002212 | 1, 3, 10, 36, 137, 543, 2219, 9285, 39587, ... | |

4 | A005572 | 1, 4, 17, 76, 354, 1704, 8421, 42508, 218318, ... | |

5 | A182401 | 1, 5, 26, 140, 777, 4425, 25755, 152675, 919139, ... | |

6 | A025230 | 1, 6, 37, 234, 1514, 9996, 67181, 458562, 3172478, ... |

#### N4 / Scale derivative / #Math / 2015.02.28

Derivative of f(x) is defined as the limit (f(x + h) - f(x)) / h, h → 0. By replacing the shift f + h with scaling x / s, we obtain the scale derivative: g(x) = Limit (f(x / s) - f(x)) / Ln s, s → 1. In the logarithmic x scale, scaling is equivalent to shifting and scale derivative to the normal derivative: t = Ln x; q(t) = f(x); g(x) = Limit (q(t - Ln s) - q(t)) / Ln s, s → 1 = -dq(t) / dt = -df(x) / d Ln x = -x f'(x). It is assumed that x > 0. For x < 0, same results apply in a mirrored form.

g(x) to f(x) is what a gnomon is to the figurate numbers. Infinitesimally larger version of f(x) is obtained by adding g(x) to it: f(x / s) ≈ f(x) + g(x) Ln s. We can continuously "grow" f(x) in size by adding infinitesimal amounts of scaled g(x). Under the condition f(∞) = 0, the original function can be reconstructed as a superposition of g(x) with exponentially increasing scales:
f(x) = ∫_{-∞}^{0} g(x / e^{z}) dz, z = Ln s.
This is easily proven by using the identity g(x) = -dq(t) / dt. If the upper integration limit is different, f(x) is obtained at the corresponding scale.

Alternatively, f(x) can be seen as being reconstructed from area-normalized g(x) at scales from 0 to 1:
f(x) = ∫_{0}^{1}g(x / s) / s ds.
We can "grow" the distribution with PDF equivalent to f(x) by sampling from PDF ∝ g(x / s) starting with s = 0 and increasing it linearly. At any time, distribution of all the values generated so far will match f(x) up to scaling. This trick will only work if f(x) is non-increasing. Otherwise, g(x) will contain negative values and cannot be a proper PDF.

#### N5 / Generating generalized Gaussian, Cauchy and related random variables / #Math #Algorithms/ 2015.03.31

Consider a generalized Gaussian (PDF ∝ e^{-|x|p}) and generalized Cauchy (PDF ∝ 1 / (1 + |x|^{p})) random variables. An easier to sample bounding functions are required to apply the rejection sampling technique. It is easy to verify that both distributions are log-log convex upwards. Therefore they can be bounded by tangent lines in the log-log scale - that is, by power functions. Piecewise power law PDFs are easily handled by inversion sampling of their pieces with corresponding probabilities. It suffices to sample the x > 0 half of the distribution and symmetrize it by multiplying the results by ±1 at random.

Let's bound f(x) ∝ PDF with h(x) consisting of two segments: h(x) = 1 if x < x_{1}, A x^{-q} otherwise. The segments match at x_{1}, so x_{1} = A^{1 / q}. Area below h(x) is S = A^{1 / q} (1 + 1 / (q - 1)). The efficiency (acceptance rate) is ∫_{0}^{+∞} f(x) dx / S. Let's denote the point where h(x) touches f(x) as x_{e}. Then A = f(x_{e}) x_{e}^{q} and q is the log-log slope of f(x) at x_{e}. Expressing A through q and solving dS / dq = 0 yields the optimal q. This process leads to a transcendental equation for the generalized Cauchy density: q / (1 - q) = Ln(1 - q / p). A slightly less efficient function that avoids the problem is obtained by assuming q = p (tangent at +∞).

Name | f(x) | q | A | Efficiency | Min efficiency |
---|---|---|---|---|---|

Generalized Gaussian | e^{-|x|p} | p + 1 | ((q / p) / e)^{q / p} | (p^{-1}! e^{1 / p}) / (1 + 1 / p)^{1 + 1 / p} | 0 at p → 0 |

Generalized Cauchy | 1 / (1 + |x|^{p}) | p | 1 | τ (p - 1) / (2 p^{2} Sin(τ / (2 p))) | τ / 8 at p = 2 |

Scale derivatives of the examined functions can be enclosed with three segments: x^{p} if x < x1, M if x < x2, A x^{-q} otherwise, where M = Max f(x). Scale derivative of Cauchy density can be effectively sampled by assuming q = p. In the Gaussian case, optimal q is a solution to the transcendental equation (1 / p + 1 / (1 - q)) q = Ln(1 + q / p). Approximate solution obtained with the aid of Newton's method is listed in the table below.

f(x) | M | q | A | Min efficiency |
---|---|---|---|---|

x^{p} e^{-|x|p} | 1 / e | Max((15 / 7) p + 13 / 19, 1 + p) | ((1 + q / p) / e)^{1 + q / p} | 0 at p → 0 |

x^{p} / (1 + |x|^{p})^{2} | 1 / 4 | p | 1 | 2 / (1 + Ln 4) at p → +∞ |

#### N6 / Triplane hexagonal coordinates / #Math #Programming #Board_Games / 2015.04.06

Consider a hexagonal-shaped region of a hexagonal grid. A coordinate system can be formed by three semi-axes 120° apart starting from the center of the region.

The system is well suited for board games. Most hexagons are encoded with combinations of two symbols. Axis hexagons are encoded with one symbol. Compared to existing hexagonal chess variants, the system has a pleasant symmetry. Each symbol except 0 appears in equal number of hexes. Hexagonal array can be efficiently stored as three rectangular arrays overlapping at the axes. Conversion to the cubic (3D) coordinates is straightforward: triplane coordinates (n, a, α) are equal to the absolute values of the coordinates of 3D axes (x, y, z) lying in the corresponding sector: (n, a) = (-z, y), (a, α) = (-x, z), (α, n) = (-y, x).

#### N7 / Short brainfuck programs / #Programming / 2015.04.22

Following programs were generated with Ascension framework. Memory cells are assumed to be integers [0 .. 255], wrapping around.Text output:

Program | Length | Output |
---|---|---|

+>-[>>+>+[++>-<<]-<+<+]>---.<<<<++.<<----..+++.>------.<<+.>.+++.------.>>-.<+. | 79 | Hello World! |

+[-[<<[+[--->]-[<<<]]]>>>-]>-.---.>..>.<<<<-.<+.>>>>>.>.<<.<-.<<+. | 66 | hello world! |

For comparison, shortest human-coded "Hello World!" program is 89 instructions long:

++++++++++[>>->++>+++>+++>----->+[++++++++<]<-]>>++.>+.>--..>+.>++.>---.<<.+++.<.<-.>>>+.

Calculating sequences in memory:

Program | Length | Output |
---|---|---|

-[+>+>--[<]>>] | 14 | Powers of 2 |

+[>>-->+[<]<+] | 14 | Powers of 2 |

>-[[<+>>>-<-<+]>] | 17 | Fibonacci sequence |

>-[[>-->+<<<+>+]>--] | 20 | Squares |

#### N8 / Lattice distance distortion / #Math / 2015.05.17

Take a regular lattice with each point connected to a few neighbouring points, thereby forming a graph. Let's only consider physically meaningful neighbour sets: a line connecting a lattice point to any of its neighbours should not intersect other Voronoi cells in more than one point. This requirement is stricter than the cell vertex connectivity. A neighbourhood can be denoted by the number of nearest lattice points satisfying the connectivity requirement. Graph edges can either have unit weights or have weights equal to the Euclidean distances between the points they connect (w suffix). Let {R} be a set of ratios of the shortest graph distances to the Euclidean distances between any two lattice points. Distance distortion is Max {R} / Min {R}. Distortions of various 2D and 3D lattices are summarized in the following tables. f, e, v denote face, edge and vertex cell connectivity.

Voronoi cell | Neighbors | Connectivity | Distortion | % |
---|---|---|---|---|

Triangle | 6 | ev | 2 | 100 |

Triangle | 3 | e | 3 / 2 | 50 |

Square | 8 | ev | √2 | 41 |

Square | 4 | e | √2 | 41 |

Triangle | 6w | ev | 2 / √3 | 15 |

Hexagon | 6 | e | 2 / √3 | 15 |

Square | 8w | ev | (1 + √2) / √5 | 8 |

Lattice | Voronoi cell | Neighborhood | Connectivity | Distortion | % |
---|---|---|---|---|---|

Simple cubic | Cube | 26 | fev | √3 | 73 |

Body-centered cubic | Truncated octahedron | 8 | f | √3 | 73 |

Simple cubic | Cube | 6 | f | √3 | 73 |

Simple cubic | Cube | 18 | fe | 4 / √6 | 63 |

Body-centered cubic | Truncated octahedron | 14 | f | √2 | 41 |

Hexagonal close-packed | Trapezo-rhombic dodecahedron | 12 | f | √2 | 41 |

Face-centered cubic | Rhombic dodecahedron | 12 | f | √2 | 41 |

Simple cubic | Cube | 18w | fe | (1 + √2) / √3 | 39 |

Body-centered cubic | Truncated octahedron | 14w | f | √3 / 2 | 22 |

Simple cubic | Cube | 26w | fev | (1 + √3) / √6 | 12 |

#### N9 / Fourier-based area in polar coordinates / #Math / 2015.05.23

Take a Fourier decomposition of a function defined on the [0, τ] interval: f(t) = a_{0} + ∑ a_{i} Sin(i t + φ_{i}). Then the area of f plotted in polar coordinates is (τ / 2) (a_{0}^{2} + ∑ a_{i}^{2} / 2).

#### N10 / On Weierstrass and Riemann functions / #Math / 2015.06.15

Take Fourier series:

f_{s}(x) = ∑ A

_{i}Sin(F

_{i}x)

f

_{c}(x) = ∑ A

_{i}Cos(F

_{i}x)

Weierstrass and Riemann-style functions correspond to exponential and power-law forms of A_{i} and F_{i}.

Name | A_{i} | F_{i} | Log-log slope β | Dimension D | N terms |
---|---|---|---|---|---|

Weierstrass | a^{-i} | b^{i} | 1 + 2 Ln |a| / Ln b | 2 - Ln |a| / Ln b | -Ln ε / Ln |a| |

Riemann | i^{-a} | i^{b} | 1 + (2 a - 1) / b | 2 - (a - 1 / 2) / b, b ≠ 1 | ε^{2 / (1 - 2 a)} / 2 |

Numerical calculations can only rely on a limited number of terms. Suppose we want to achieve a relative inacuraccy ε. Then ε^{2} is the ratio of the power of the truncated terms to the total power. Solving that equation gives the required number of terms.

There is a very simple non-rigorous way to estimate fractal dimensions of the functions in question. Power density around the i-th term is

P_{i}= A

_{i}

^{2}/ (F

_{i + 1 / 2}- F

_{i - 1 / 2}) ≈ A

_{i}

^{2}/ F'

_{i}

In that sense both Weierstrass and Riemann functions have an approximately power-law distributed power density. A well-known result pertaining to the fractional Brownian motion states that if the power spectrum is proportional to F^{-β}, then the fractal dimension is D = (5 - β) / 2.

Weierstrass Sin and Cos functions, a = √2, b = 2, D = 3 / 2:

Riemann Sin and Cos functions, a = 3 / 2, b = 2, D = 3 / 2:

Let's modify the cosine-based functions so that they have zero at the origin: f*(x) = 1 - f_{c}(x) / f_{c}(0). When plotted in polar coordinates, these modified functions resemble butterflies and leaves.

Modified Riemann function, D = 3 / 2, b = 2, 3, 4, 5:

#### N11 / Pawn protection of initial chess positions / #Math #Chess / 2015.08.01

Consider all possible arrangements of the chess pieces on the first rank. Under the restriction of bishops occupying opposite colored squares, there are 1440 reflection-distinct positions. 542 of these positions have every pawn protected at least once. Two maximally protected positions with only one pawn being protected once and all other pawns twice are RQBBRNKN and RKBBRNQN. Two minimally protected positions having three unprotected pawns are NNRQBKRB and NNRKBQRB.

#### N12 / Cantor's numbers / #Math / 2015.08.25

Recall Cantor's first uncountability proof. Start with an interval and a sequence of real numbers x_{i}. Find the first two numbers from x_{i} that lie inside the interval. These two numbers form the next interval. Repeat. In case of an infinite number of iterations, any number from the limit interval is not contained in x_{i}.

Take a sequence of rational numbers within the initial interval, sorted in order of increasing denominator, and in case of equal denominators in order of increasing numerator. What number does Cantor's bracketing procedure converges to?

Take two fractions A = an / ad, B = bn / bd, ad ≤ bd, satisfying the determinant condition |bn ad - an bd| = 1. The first number lying between A and B is the mediant of A and B: M = (an + bn) / (ad + bd). The next number is the mediant of A and M. We thus obtain a system of recurrent equations:

a_{i} = a_{i - 1} + b_{i - 1}

b_{i} = 2 a_{i - 1} + b_{i - 1}

Thanks to the mediant property, the determinant condition is preserved throughout the recurrence. Eliminating b_{i} gives a single recurrent equation a_{i + 1} = 2 a_{i} + a_{i - 1}, which has a solution a_{i} = α (1 + √2)^{i} + β (1 - √2)^{i}. Plugging in initial conditions gives α = (a_{0} + b_{0} / √2) / 2. The sought-after Cantor number is the limit:

_{i}/ ad

_{i}= αn / αd = (√2 an

_{0}+ bn

_{0}) / (√2 ad

_{0}+ bd

_{0})

#### N13 / Fraction digit tables / #Math / 2015.11.25

Following images depict binary and ternary digits of irreducible fractions in the (0, 1) interval. Fractions are ordered first by denominators and then by numerators.

Take the numbers formed by digits lying on the main diagonal. By Cantor's diagonal argument, a complement of the binary diagonal number is irrational.

Binary diagonal number:Binary: 0.111000111001010110000110111110101011100111111011011010000100011 ... (OEIS A257341)

Decimal: 0.8890003549695230139346719690879129310538480210179543414543465370 ... (OEIS A257342)

Continued fraction: 0, 1, 8, 110, 1, 1, 1, 4, 1, 3, 33, 3, 4, 1, 26, 1, 37, 1, 23, 2, 117, 1, 1, 5, 1, 2, 20, 12, 1, 2, 1, 4, 1, ...

Ternary diagonal number:

Ternary: 0.1002211021120000111220000002000011111210202200100001202102121112 ...

Decimal: 0.3682120994629780785967399570406755660508922444001004151748993819 ...

Continued fraction: 0, 2, 1, 2, 1, 1, 12, 1, 2, 9, 2, 6, 5, 1, 2, 2, 7, 5, 1, 1, 2, 2, 2, 6, 7, 1, 41, 1, 4, 2, 2, 3, 1, ...

#### N14 / Multidimensional scaling for vowel-consonant separation / #Algorithms / 2015.12.04

Multidimensional scaling (MDS) can be effectively used for character-level text analysis. Denote the distribution of characters appearing at position δ relative to the character a as D(a, δ). Distance between characters a and b can be defined as

ρ(D(a, +1), D(b, +1)) + ρ(D(a, -1), D(b, -1)),where ρ(Da, Db) is some measure of distance between the two distributions. MDS applied to the distance matrices obtained from texts in natural languages readily reveals clusters of vowels, consonants and punctuation marks. Following images depict the results for English, Russian and Voynich manuscript characters and the L1 metric ρ.

#### N15 / Vectors with constant and normal components / #Math / 2016.01.29

Take a 2D vector having one constant component c and the other component coming from a standard normal distribution. PDF of the distribution of the vector lengths is given by ρ(x) = 2 x Exp(-(x^{2} - c^{2}) / 2) / √τ (x^{2} - c^{2}).

The mean of the distribution is approximated by √c^{2} + c / 6 + 4 / τ with a maximum relative error of 1%.

#### N16 / Embedding hexagonal grid in a square grid / #Programming #Board_Games / 2016.02.01

A subset of square grid locations (i, j) corresponding to a hexagonal grid is given by (i + 2 j) mod 4 = 0. Neighbours of a hex have offsets (0, ±2), (±2, ±1). 3D coordinates of a hex are

x = i / 2y = (-i + 2 j) / 4

z = (-i - 2 j) / 4

This representation is easy to work with, but has poor memory efficiency, as only 1/4th of all locations are actually used. However, these extra locations can be used to store additional information, for example serve as an intermediate buffer in a cellular automata scenario.

#### N17 / Irrationality plots / #Math / 2016.02.01

Let x be a real number, a_{i} its continued fraction representation without the integer part, p_{i} / q_{i} its convergents. Let's define irrationality functions:

F_{a} = ∑ 2^{-i} / a_{i}

F_{b} = ∑ 1 / q_{i}

F_{c} = ∑ 2^{-i} Ln a_{i}

For terminating continued fractions, a_{i} is extended with +∞ terms, which does not changes the fraction value. Obviously F_{a} and F_{b} converge for all x. F_{c} converges almost everywhere except for rationals and some transcendentals with fast growing a_{i}. Numerical calculations of F_{c} based on machine precision rationals should allways yeild +∞. So for the purpose of drawing F_{c}, a_{i} is extended with Khinchin's constant representative of the typical a_{i} values instead of +∞.

_{a}(x):

F

_{b}(x):

F

_{c}(x):

Local irrationality maxima are attained at x whose continued fractions end with an infinite sequence of 1s. Such x have Stern-Brocot tree paths that eventually alternate between left and right ad infinitum, as each run of left or right steps corresponds to a continued fraction term equal to its length. Take an interval [A, B], A = An / Ad, B = Bn / Bd with A, B being unimodular: |An Bd - Ad Bn| = 1. Suppose that B is the more complex fraction in a sense that Bd ≥ Ad and Bn ≥ An. This means that in the Stern-Brocot tree, B is a right child of A. Take the left step by computing a mediant of A and B, then the right step and so on. We thus obtain a system of recurrent equations for both numerators and denominators (see InvMem 12 for a similar recursion in a different context):

Ax_{i + 1}= Bx

_{i}

Bx

_{i + 1}= Ax

_{i}+ Bx

_{i}

Solution to the above equations is a Fibonacci sequence:

Ax_{i}= α φ

^{i}+ β (-1 / φ)

^{i}

α = (Ax

_{0}+ φ Bx

_{0}) / (1 + φ)

The sought-after limit, which might be called the golden mediant, is

M_{g}(A, B) = Lim An / Ad = (An + φ Bn) / (Ad + φ Bd)

If A is the more complex fraction instead, A and B should be swapped before calculating the golden mediant. M_{g}(A, B) is a point of maximal irrationality on the unimodular interval [A, B]. Golden mediants share the worst rational approximation property (see Hurwitz's theorem) with the golden ratio.

#### N18 / Maximal chess positions / #Chess #Optimization / 2016.02.04

Following record-breaking positions were generated with Ascension framework. Amount of improvement over the previous records (see "Guide to fairy chess" by Anthony Dickins) is indicated in brackets.

168 moves by both sides, all pieces on the board (+4)

183 moves by both sides, no promotions allowed (+2)

224 moves by both sides, promotions allowed (+1)

89 captures by both sides, all pieces on the board (+1)

50 captures by white, all pieces on the board (+1)

#### N19 / Weighted minimum and maximum / #Math / 2016.02.22

How can one define a weighted minimum and maximum of the numbers a_{i} having weights w_{i}? One approach is to calculate a limit of a weighted power mean:

^{q}√∑ a

_{i}

^{q}w

_{i}

^{q}/ ∑ w

_{i}

^{p}, q → +∞ = Max(a

_{i}w

_{i}) / Max(w

_{i})

WMin = Lim

^{q}√∑ a

_{i}

^{q}w

_{i}

^{-q}/ ∑ w

_{i}

^{-q}, q → -∞ = Min(a

_{i}/ w

_{i}) / Min(1 / w

_{i})

Another idea is to interpret the normalized weight w_{i} as the probability of a_{i} existing. Normalize the weights to unit maximum by dividing by Max w_{i} . Sort a_{i} in ascending | descending order. The probability of a_{i} being the maximum | minimum is the probability of it existing and all greater | smaller values not existing:

_{i}= w

_{i}∏ (1 - w

_{j}), j > i.

The sought-after result is the mean of the resulting distribution: WMax | WMin = ∑ a_{i} p_{i}.

#### N20 / Maximal density subsquare-free arrangements / #Optimization #OpenProblem / 2016.02.22

What is the largest number of points that can be placed on a N × N grid so that no four of them form a square? Exact values are known for N up to 9 (OEIS A240443). Following solutions for N = 10 .. 16 were generated with tabu search and simulated annealing via Ascension framework.

10: 50 | 11: 58 | 12: 67 | 13: 76 |

14: 86 | 15: 95 | 16: 106 |

If only axis-parallel squares are considered, maximal numbers are known for N up to 10 (OEIS A227133). Solutions obtained for N = 11 .. 16:

11: 73 | 12: 85 | 13: 98 |

14: 112 | 15: 127 | 16: 142 |

#### N21 / Sierpinsky-like arrays / #Math / 2016.03.01

Define a 2 × 2 "seed" ((1, 1), (1, 0)). Start with A = ((1)), iteratively replace every element A_{ij} with A_{ij} × Seed. Resulting infinite array is essentially a Sierpinsky triangle. Alternatively, A_{ij} can be defined as being equal to 1 only if binary representations of i and j do not contain a '1' in the same position: A_{ij} = 1 if i AND j = 0, 0 otherwise, where AND denotes a bitwise 'and' operation.

Let a(n) = ∑ A_{ij} for 0 ≤ i, j < n. a(2 n) = 3 a(n), because after a single replacement iteration, every 1 yields a block containing three new 1s. For odd indices, the edge contains incomplete blocks that only have two 1s, so a(2 n + 1) = 2 a(n + 1) + a(n) (A064194).

Two 3D versions arise naturally.

1. Seed = (((1, 1), (1, 1)), ((1, 1), (1, 0))). All three coordinates of nonzero elements never have a '1' bit in identical position: i AND j AND k = 0. Let a(n) = ∑ A_{ijk} for 0 ≤ i, j, k < n. a(2 n) = 7 a(n). For odd n, in addition to incomplete 2 × 2 × 1 face blocks containing four 1s, there are 3 × 2^{Z(n)} incomplete 2 × 1 × 1 edge blocks only containing two 1s, where Z(n) is the number of zeroes in the binary representation of n. Thus a(2 n + 1) = 4 a(n + 1) + 3 a(n) - 6 × 2^{Z(2 n + 1)}. a(n) (A269589) is a lower bound for the 3D verson of no subsquares problem (A268239, also see InvMem 20).

2. Seed = (((1, 1), (1, 0)), ((1, 0), (0, 0))). No pair of coordinates of nonzero elements contains a '1' bit in identical position: i AND j = 0, j AND k = 0, k AND i = 0. a(2 n) = 4 a(n), a(2 n + 1) = 3 a(n + 1) + a(n) (A268524).

#### N22 / Peaceable queens / #Math #Chess #Optimization / 2016.04.03

OEIS A250000 gives the largest M such that M white and M black queens can be placed on a N × N board and queens of opposite color do not attack each other. For N ≥ 14, simulated annealing yields solutions with 28, 32, 37, 42, 47, 52, 58, 64, 70, 77, 84 queens per side.

Solutions typically consist of four polygonal regions. What happens at the limit N → ∞? Replace the N × N board by a unit square. The areas of the regions (see illustration) are:

^{2}) / 8

A2 = (1 − 4 x

^{2}− 2 y

^{2}) / 8

A = A1 + A2 = (2 x (1 − 2 x) + y (2 − 3 y)) / 4

Maximum of the total area A is 7 / 48, attained at x = 1 / 4, y = 1 / 3. Observation: with a few exceptions having a non-polygonal structure (N = 5, 9), M = Floor(7 / 48 N^{2}).

#### N23 / RGB Principal component analysis / #Math #Color / 2016.05.29

Principal component analysis applied to gamma-compressed RGB values from a large set of photographic images (MIRFLICKR25000 dataset) yielded the following transformation matrix:

0.576 | 0.585 | 0.571 |

0.721 | -0.034 | -0.692 |

-0.386 | 0.810 | -0.441 |

Relative importance of the axes, as measured by the square root of their eigenvalues, is about 1 : 1/3 : 1/7. The axes closely correspond to black - white, orange - azure (hue 30° - 210°) and green - magenta (hue 120° - 300°) directions. Normalized to unit dynamic range of the axis projections, the matrix is well approximated by

1/3 | 1/3 | 1/3 |

1/2 | 0 | -1/2 |

-1/4 | 1/2 | -1/4 |

#### N24 / Lightness approximations / #Color #Approximation / 2016.05.29

Lightness of a color can be calculated with the aid of the Lab color system, but in many cases a simple approximation would suffice. Following tables summarize such approximations. R, G, B denote gamma-compressed sRGB values. Equivalent gray level denotes the level of sRGB gray having the same lightness as the color in question. The approximations were obtained by minimizing the L_{2} and L_{∞} errors. L_{2} error is weighted with typical real-world colors (RGB data taken from the MIRFLICKR25000 dataset), and the L_{∞} error was calculated along the whole RGB cube. All the approximations give 0 when applied to black and 1 when applied to white.

Equivalent gray level approximation | Error | |

0.301 R + 0.622 G + 0.077 B | L_{2} = | 1 / 55 |

0.279 R + 0.643 G + 0.078 B | L_{∞} < | 1 / 4 |

(0.228 R^{2} + 0.696 G^{2} + 0.076 B^{2})^{1 / 2} | L_{2} = | 1 / 367 |

(0.224 R^{2} + 0.700 G^{2} + 0.076 B^{2})^{1 / 2} | L_{∞} < | 1 / 38 |

(0.215 R^{p} + 0.714 G^{p} + 0.071 B^{p})^{1 / p}, p = 2.169 | L_{2} = | 1 / 496 |

(0.234 R^{p} + 0.695 G^{p} + 0.071 B^{p})^{1 / p}, p = 2.063 | L_{∞} < | 1 / 40 |

Lightness approximation | Error | |

0.325 R + 0.641 G + 0.034 B | L_{2} = | 1 / 30 |

0.288 R + 0.633 G + 0.079 B | L_{∞} < | 1 / 4 |

(0.267 R^{2} + 0.674 G^{2} + 0.059 B^{2})^{1 / 2} | L_{2} = | 1 / 43 |

(0.232 R^{2} + 0.692 G^{2} + 0.076 B^{2})^{1 / 2} | L_{∞} < | 1 / 19 |

(0.208 R^{p} + 0.744 G^{p} + 0.048 B^{p})^{1 / p}, p = 2.909 | L_{2} = | 1 / 47 |

(0.222 R^{p} + 0.705 G^{p} + 0.073 B^{p})^{1 / p}, p = 2.134 | L_{∞} < | 1 / 24 |

(0.213 R^{p} + 0.730 G^{p} + 0.057 B^{p})^{1 / q}, p = 2.211, q = 2.367 | L_{2} = | 1 / 76 |

#### N25 / Feature scaling / #Math / 2016.10.15

Feature scaling directly affects the results of dimensionality reduction methods such as PCA and MDS. Standardization (rescaling to zero mean and unit variance) is the most common method. But if all the features belong to a ratio scale (have a meaningful zero), one can divide by the mean or the root mean square instead.

Normalization factor | Feature importance |
---|---|

1 | σ |

σ | 1 |

μ | CV |

√⟨x^{2}⟩ | CV / √1 + CV^{2} |

A feature's impact is proportional to its standard deviation after the normalization. Standardization assigns equal importance to all features, while the unit mean and RMS normalization emphasize features with a higher coefficient of variation (CV = σ / μ), which may be more appropriate if relative differences are important. Note that the RMS normalization guarantees bounded feature importance. If all the features have the same units of measurement, the normalization can be skipped, although it would likely be beneficial if the features significantly differ in scale.

Another potential issue is the presence of highly correlated features. Duplicating a feature adds no new information but will distort the results of PCA or MDS. Consider a dataset having D uncorrelated features. Replace each feature i by M_{i} duplicates. The data will now lie in a subspace spanned by D oblique axes and each feature will be inflated by a factor of √M_{i}.

Collinear inflation correction (CIC) factors applicable in the general case have the form F_{i} = √∑ Corr^{2}(i, j), where Corr(i, j) is Pearson's correlation between features i and j. CIC provides a principled and robust alternative to discarding highly correlated features. It is independent of the feature normalization and should be performed in conjunction with it.

If you appreciate the effort put into creating this page and would like to see more frequent updates, you can support me on Patreon. Top supporters this month:

- Glenn Fiedler
- Neil J. A. Sloane
- Brian Bucklew
- Anton Shepelev