VizualBusiness: A Fourier-free proof of the Furstenberg-Sarkozy theorem

Friday, March 1, 2013

A Fourier-free proof of the Furstenberg-Sarkozy theorem

A Fourier-free proof of the Furstenberg-Sarkozy theorem:
The following result is due independently to Furstenberg and to Sarkozy:

Theorem 1 (Furstenberg-Sarkozy theorem) Let ${\delta > 0}$ , and suppose that is sufficiently large depending on ${\delta}$ . Then every subset of ${[N] := \{1,\ldots,N\}}$ of density at least ${\delta}$ contains a pair ${n, n+r^2}$ for some natural numbers with ${r \neq 0}$ .

This theorem is of course similar in spirit to results such as Roth’s theorem or Szemerédi’s theorem, in which the pattern ${n,n+r^2}$ is replaced by {n,n+r,n+2r}

or ${n,n+r,\ldots,n+(k-1)r}$ for some fixed {k}

respectively. There are by now many proofs of this theorem (see this recent paper of Lyall for a survey), but most proofs involve some form of Fourier analysis (or spectral theory). This may be compared with the standard proof of Roth’s theorem, which combines some Fourier analysis with what is now known as the density increment argument.

A few years ago, Ben Green, Tamar Ziegler, and myself observed that it is possible to prove the Furstenberg-Sarkozy theorem by just using the Cauchy-Schwarz inequality (or van der Corput lemma) and the density increment argument, removing all invocations of Fourier analysis, and instead relying on Cauchy-Schwarz to linearise the quadratic shift ${r^2}$ . As such, this theorem can be considered as even more elementary than Roth’s theorem (and its proof can be viewed as a toy model for the proof of Roth’s theorem). We ended up not doing too much with this observation, so decided to share it here.

The first step is to use the density increment argument that goes back to Roth. For any ${\delta > 0}$ , let ${P(\delta)}$ denote the assertion that for {N}

sufficiently large, all sets ${A \subset [N]}$ of density at least ${\delta}$ contain a pair ${n,n+r^2}$ with {r}

non-zero. Note that ${P(\delta)}$ is vacuously true for ${\delta > 1}$ . We will show that for any ${0 < \delta_0 \leq 1}$ , one has the implication

$\displaystyle P(\delta_0 + c \delta_0^4) \implies P(\delta_0) \ \ \ \ \ (1)$

for some absolute constant {c>0}

. This implies that ${P(\delta)}$ is true for any ${\delta>0}$ (as can be seen by considering the infimum of all ${\delta>0}$ for which ${P(\delta)}$ holds), which gives Theorem 1.

It remains to establish the implication (1). Suppose for sake of contradiction that we can find ${0 < \delta_0 \leq 1}$ for which ${P(\delta_0+c\delta^3_0)}$ holds (for some sufficiently small absolute constant {c>0}

), but ${P(\delta_0)}$ fails. Thus, we can find arbitrarily large {N}

, and subsets {A}

of density at least ${\delta_0}$ , such that {A}

contains no patterns of the form ${n,n+r^2}$ with {r}

non-zero. In particular, we have

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} 1_A(n) 1_A(n+(r+h)^2) = 0.$

(The exact ranges of {r}

and

are not too important here, and could be replaced by various other small powers of {N}

if desired.)
Let ${\delta := |A|/N}$ be the density of {A}

, so that ${\delta_0 \leq \delta \leq 1}$ . Observe that

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} 1_A(n) \delta 1_{[N]}(n+(r+h)^2) = \delta^2 + O(N^{-1/3})$

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} \delta 1_{[N]}(n) \delta 1_{[N]}(n+(r+h)^2) = \delta^2 + O(N^{-1/3})$

and

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} \delta 1_{[N]}(n) 1_A(n+(r+h)^2) = \delta^2 + O( N^{-1/3} ).$

If we thus set ${f := 1_A - \delta 1_{[N]}}$ , then

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} f(n) f(n+(r+h)^2) = -\delta^2 + O( N^{-1/3} ).$

In particular, for {N}

large enough,

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} |\mathop{\bf E}_{h \in [N^{1/100}]} f(n+(r+h)^2)| \gg \delta^2$

and hence by the Cauchy-Schwarz inequality

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} |\mathop{\bf E}_{h \in [N^{1/100}]} f(n+(r+h)^2)|^2 \gg \delta^4$

which we can rearrange as

$\displaystyle |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h,h' \in [N^{1/100}]} \mathop{\bf E}_{n \in [N]} f(n+(r+h)^2) f(n+(r+h')^2)| \gg \delta^4.$

Shifting

by ${(r+h)^2}$ we obtain (again for {N}

large enough)

$\displaystyle |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h,h' \in [N^{1/100}]} \mathop{\bf E}_{n \in [N]} f(n) f(n+(h'-h)(r+h'+h))| \gg \delta^4.$

In particular, by the pigeonhole principle (and deleting the diagonal case {h=h'}

, which we can do for {N}

large enough) we can find distinct ${h,h' \in [N^{1/100}]}$ such that

$\displaystyle |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{n \in [N]} f(n) f(n+(h'-h)(r+h'+h))| \gg \delta^4,$

so in particular

$\displaystyle \mathop{\bf E}_{n \in [N]} |\mathop{\bf E}_{r \in [N^{1/3}]} f(n+(h'-h)(r+h'+h))| \gg \delta^4.$

If we set

and shift

, we can simplify this as

$\displaystyle \mathop{\bf E}_{n \in [N]} |\mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr)| \gg \delta^4. \ \ \ \ \ (2)$

On the other hand, since

$\displaystyle \mathop{\bf E}_{n \in [N]} f(n) = 0$

we have

$\displaystyle \mathop{\bf E}_{n \in [N]} f(n+dr) = O( N^{-1/3+1/100})$

for any ${r \in [N^{1/3}]}$ , and thus

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr) = O( N^{-1/3+1/100}).$

Averaging this with (2) we conclude that

$\displaystyle \mathop{\bf E}_{n \in [N]} \max( \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr), 0 ) \gg \delta^4.$

In particular, by the pigeonhole principle we can find ${n \in [N]}$ such that

$\displaystyle \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr) \gg \delta^4,$

or equivalently {A}

has density at least ${\delta+c'\delta^4}$ on the arithmetic progression ${\{ n+dr: r \in [N^{1/3}]\}}$ , which has length ${\lfloor N^{1/3}\rfloor }$ and spacing {d}

, for some absolute constant {c'>0}

. By partitioning this progression into subprogressions of spacing ${d^2}$ and length ${\lfloor N^{1/4}\rfloor}$ (plus an error set of size ${O(N^{1/4})}$ , we see from the pigeonhole principle that we can find a progression ${\{ n' + d^2 r': r' \in [N^{1/4}]\}}$ of length ${\lfloor N^{1/4}\rfloor}$ and spacing ${d^2}$ on which {A}

has density at least ${\delta + c\delta^4}$ (and hence at least ${\delta_0+c\delta_0^4}$ ) for some absolute constant {c>0}

. If we then apply the induction hypothesis to the set

$\displaystyle A' := \{ r' \in [N^{1/4}]: n' + d^2 r' \in A \}$

we conclude (for {N}

large enough) that {A'}

contains a pair ${m, m+s^2}$ for some natural numbers {m,s}

with

non-zero. This implies that ${(n'+d^2 m), (n'+d^2 m) + (|d|s)^2}$ lie in {A}

, a contradiction, establishing the implication (1).
A more careful analysis of the above argument reveals a more quantitative version of Theorem 1: for ${N \geq 100}$ (say), any subset of {[N]}

of density at least ${C/(\log\log N)^{1/3}}$ for some sufficiently large absolute constant {C}

contains a pair ${n,n+r^2}$ with {r}

non-zero. This is not the best bound known; a (difficult) result of Pintz, Steiger, and Szemeredi allows the density to be as low as ${C / (\log N)^{\frac{1}{4} \log\log\log\log N}}$ . On the other hand, this already improves on the (simpler) Fourier-analytic argument of Green that works for densities at least ${C/(\log\log N)^{1/11}}$ .

Remark 1 A similar argument also applies with ${n,n+r^2}$ replaced by ${n,n+r^k}$ for fixed , because this sort of pattern is preserved by affine dilations ${r' \mapsto n'+d^k r'}$ into arithmetic progressions whose spacing ${d^k}$ is a ${k^{th}}$ power. By re-introducing Fourier analysis, one can also perform an argument of this type for ${n,n+r^2,n+2r^2}$ ; see the above-mentioned paper of Green for details. However there seems to be some technical difficulty in extending it to patterns of the form for polynomials that consist of more than a single monomial (and with the normalisation , to avoid local obstructions), because one no longer has this preservation property.