Added By: Feedage Forager | |
Language: English | |
Tags: | |
casino coin data election game ndash party player problem random results stochastic election stochastic types | |
Rate this Feed |
Comments (0) |
Feed Details and Statistics |
Updated: 2017-10-10T00:17:46Z
2017-10-10T00:17:46Z
My style of Haskell programming has been evolving over the 15 years that I have been working with it. It is turning into something that I would like to call “functor oriented programming”. The traditional use of typed functional programming focuses on data types. One defines data types to model the data structures that your program operates on, and one writes functions to transform between these structures. One of the primary goals in this traditional methodology is to create data structures that exclude erroneous states to the extent that is reasonably possible. As long as one ensures that pattern matching is complete, then the type system catches many errors that would otherwise lead to these erroneous states, which have been crafted to be unrepresentable. Functor oriented programming is a refinement of this traditional focus on data types. I was reminded of this concept recently when I was working with wren’s fantastic unification-fd library. With functor oriented programming, one divides data structures into layers of functors that, when composed together, form the data structures that your program operates on. Instead of writing transformations between data structures, one writes natural transformations between functors, where a natural transformation between functors F and G is a polymorphic function of type forall a. F a -> G a. While traditional functions often operate on products of multiple inputs and/or outputs, with functor oriented programming one will often see functions operating on compositions of functors, including but not limited to distribution functions of type forall a. F (G a) -> G (F a) and half-way distribution functions forall a. F (G a) -> G (H a), and many others. By dividing data structures up into layers of functors, one can create a separation of concerns that does not occur in traditional functional programming. With functor oriented programming, polymorphism is not necessarily about using functions polymorphically. Instead, polymorphism provides correctness guarantees by ensuring that a particular function can only touch the specific layers of functors in its type signature and is independent of the rest of the data structure. One benefits from polymorphism even when a function is only ever invoked at a single type. The appearance of many natural transformations is one hallmark of functor oriented programming. Higher-order natural transformations will invoke Rank2Types and RankNTypes, which is another hallmark of functor oriented programming. Other hallmarks of functor oriented programming include open recursive types, which allows one to divide up recursive types into their layers of recursion and create natural transformations that operate on a single layer at a time. Open recursive types plays an important role in wren’s unification library. As fine of a language that Haskell is, it is not actually that suitable for functor oriented programming. The problem is that, under normal circumstances, there is no reduction or equivalence classes at the type level. For example, the identity functor does not transparently disappear during composition, the Compose functor is not transparently associative, and the Swap functor composed with itself does not reduce to the identity functor. To cope with this one must litter one’s code with newtype wrapper and unwrappers to make all these natural transformations explicit. In principle, these transformations should have no run-time consequences, but when they are used in higher-order ways, unfortunately they sometimes do. Despite the problems, I am not aware of any another practical language that better supports this style of programming. I think Haskell’s higher-kinded type classes and the progression of Monad, Applicative, Foldable, Traversable, etc. classes have been instrumental in leading to the development of this style of programming as they further motivate the division of one’s data structures into these layers of functors. I have been thinking about writing this post for a few years now, and w[...]2017-10-08T22:27:03Z
A while back I decided to try my hand at using Agda and I wrote a proof of the Church-Rosser theorem in it. It was a fun exercise. I took all the knowledge I have picked up over the years about using dependent types to represent binders, to define well-typed terms, and what I have learned about the Category of Contexts and applied it to my definition of Terms for a simply typed lambda calculus. I am proud of the result. They say you do not understand a topic in mathematics until you can teach it. And you do not really understand it until you can prove it in Coq. And you do not really really understand it until you can prove it in Agda. What really struck me was how my exercise in Adga affected my understanding of “Propositions as Types”. I view types as being divided into roughly two categories. Some types are propositions. They are the types that have at most one inhabitant, which is to say types for which you can prove that all their inhabitants are equal. Other types are data types. They are types with potentially more than one inhabitant. As such you can distinguish between values by (possibly indirectly) doing case analysis on them. Indeed, HoTT defines propositions in exactly this way. This classification of types is not fundamental in the theory of types. The theory of types treats both propositions and data types uniformly. I simply find it a useful way of characterizing types when programming and reasoning about programs with dependent type theory. The void and unit types, ⊥ and ⊤ respectively, are both propositions. We can define a function from the Boolean type to the universe of types which map the two Boolean values to these two types. In this way we can covert any Boolean valued (computable) predicate into a logical (type-theoretical) predicate. To me the phrase “Proposition as Types” just meant the embedding of logical proposition as types with at most one inhabitant. For example, given a decidable type A, we can compute if a given value of type A is a member of a given list of As. This computable predicate can be lifted to a logical predicate to define a logical membership relationship. Expressions using this logical membership relationship are propositions according to the above definition of proposition. This is a fine way to do formal reasoning. However, this is not the way that membership is typically defined in Agda. Instead, Agda defines the membership relation as an inductive family whose constructors witness that an element is either the head of the list, or is a member of the tail of the list. (Coq also defines the membership relation this way; however it is marked as non-extractable for computation by virtue of being a proposition.) The type (a ∈ l) is, in general, a data type rather than a proposition. When the value a occurs multiple times in l, then the type (a ∈ l) has multiple different “proofs” corresponding the the different occurrences of a within l. Values of this type act as “the type of indexes where a occurs in l”, and one can write programs that operate over this type. Given two lists, l1 and l2, the “proposition” that one list is a subset of the other states that any element of l1 is also an element of l2: l1 ⊆ l2 ≔ ∀a. a∈l1 → a∈l2 In dependent type theory this implication is represented as a function. Because our membership relation is a data type, this subset relation is represented by a real program. Specifically it is a program that, for any value, maps each index where it occurs in l1 to some index where it occurs in l2; you can really evaluate this function. This subset type is also, in general, a data type because there can be multiple such functions, which represent all the possible permutations that maps l1 onto l2. The consequences of this are fantastic. For example, what you normally think of as a theorem for weakening, weaken : ∀ {Γ₁ Γ₂ A} → Γ₁ ⊆ Γ₂ &rarr[...]2017-06-16T11:45:46Z
Recently I was thinking about a programming problem that would need access to random values. I thought it might be fun to write up my though process as an advanced Haskeller while working through this particular problem. In Haskell, one would write such a program by using a random monad to access an oracle providing random numbers. The traditional way to implement MonadRandom is using the state monad. The Gen type holds the state of the (pseudo-)random number generator, and the randomInt function returns a new random number and updates the state of the generator. type Random a = State Gen a randomInt :: Gen -> (Int,Gen) randomOracle :: MonadRandom Int randomOracle = state random Then I write my program inside the Random monad, making calls to the randomOracle as needed myProg :: Random Result myProg = do {- ... -} x <- randomOracle {- ... -} y <- randomOracle {- ... -} In order to run my program, I need to provide it with a random seed. evalRandom :: Random result -> Gen -> result evalRandom = evalState For deterministic testing, we can pass fixed generators to evalRandom. If we use StdGen, we can map our Random program to IO and use the system random number generator. type Gen = System.Random.StdGen randomInt = System.Random.random evalRandomIO :: Random result -> IO result evalRandomIO = getStdRandom . runState For the most general possible random number generator, the type for the generator state is simply an infinite stream of random values. data Stream a = Cons a (Stream a) unfoldStream :: (g -> (a, g)) -> g -> Stream a unfoldStream next = go where go seed = Cons value (go nextSeed) where (value, nextSeed) = next seed type Gen = Stream Int randomInt (Cons hd tl) = (hd, tl) evalRandomStdGen :: Random result -> StdGen -> result evalRandomStdGen rr = evalRandom rr . unfoldStream System.Random.random evalRandomIO :: Random result -> IO result evalRandomIO rr = evalRandomStdGen rr <$> newStdGen Before, when I was an intermediate Haskeller, I would probably stop at this point pretty satisfied with this. And let me be clear that this is a fine solution. However, now that I am an advanced Haskeller, I cannot help but feel a little dissatisfied with this solution. The problem with this implementation of the Random monad is that the type is too broad. Since the Random type is the State monad, there are operations allowed by the type that should not be allowed for a Random program. For instance, within the Random type, someone could store the state of the generator and restore it later causing random values to be replayed, or someone might completely replace the state of the generator with their own value. One way to solve this problem is to use Haskell’s module system to abstract the monad and only expose the randomOracle operation. While this is a reasonable solution (in fact it is a very good solution as we will see), it would be nicer if we could instead use the type system to create a monad that is only capable of representing the programs we want to allow, and disallows other programs that would try manipulate the generator state in ways we do not want. Essentially we want our Random programs to only be able to query the random oracle, and that is it. After reflecting on this problem and the various kinds of monads I know about, I realized that the a suitable free monad captures exactly this notion of providing only an operation to query the random oracle. Specifically we want Control.Monad.Free.Free (Reader Int) or more directly (but more obtusely written) as Control.Monad.Free.Free ((->) Int). We truly want a free monad because any sequence of responses from the random oracle is valid. One problem with this Free monad is that the bind operation can be slow because it needs to traverse through a, possibly long, data structure. There are several solutions to this, but for this particular free monad, I happen to know that the Van Laarhoven free monad representation is possible: The type forall m. Monad m => m In[...]2015-12-10T03:37:09Z
Bell’s Casino Problem A new casino has opened up in town named “Bell’s Casino”. They are offering a coin game. The game works as follows. The house will commit two coins on the table, oriented heads or tails each, and keep them covered. The player calls what the faces of the each of the coins are, either HH, HT, TH, or TT. The casino reveals the coins and if the player is correct, they win $1, and otherwise they lose $1. Problem 1. Prove that there is no strategy that can beat the casino. Solution to problem 1. Let pHH be the probability that the casino orients the coins as HH, and similarly for pHT, pTH, and pTT. We have pHH + pHT + pTH + pTT = 1. If the player calls the orientations XY, the they win $1 with probability pXY, and lose $1 with probability 1-pXY. This means the total expected value is 2*pXY - 1 dollars. The casino can minimize the players expected value by choosing each of the four possible orientations with equal proability of 25%. In the case the players expected value is $-0.50 no matter what orientation they choose to call. After opening the customers stop coming by to play this boring game, so to boost attendance the casino modifies the game as follows. The house will commit two coins on the table, oriented heads or tails each, and keep them covered. The player calls what the faces of each of the two coins are, either HH, HT, TH, or TT. The casino reveals one coin, of the players choice. After seeing revealed coin, the player can elect to back out of the game and neither win nor lose, or keep going, and see the second coin. If the player’s call is correct, they win $1, and otherwise they lose $1. Problem 2. Prove that there is no strategy that can beat the casino. Solution to problem 2.Without loss of generality, suppose the player calls HH and they ask for the first coin to be revealed. With probability pTH + pTT the first coin is tails. At this point the player’s best option is to elect to back out of the game as it is now impossible for them to win. With probability pHH + pHT the first coin is heads. At this point the player can back out or continue. If the player always continues then they win $1 with probability pHH and they lose $1 with probability pHT. Their total expected value is pHH - pHT dollars. As before, the casino can choose each of the four possible orientations with equal proability of 25%. In the case the players expected value is $0 if they chose to continue if the revealed coin is correct, and it is also $0 if they chose to always back out of the game. No matter what strategy the player chooses, they cannot beat the casino as long as the casino chooses each of the four possible orientations with equal proability. Even with the new, more fair, game, attendance at the casino starts dropping off again. The casino decides to offer a couples game. The house will commit two coins on two tables, oriented heads or tails each, and keep them covered. The couple, together, calls what the the faces of each of the two coins are, either HH, HT, TH, or TT. Then, each player in the couple gets to see one coin each. Collectively they get to decide whether they are going to back out of the game or not by the following method. After seeing their revealed coin, each player will raise either a black flag or a red flag. If both players raise the different colour flags, the game ends and no one wins or loses. If both players raise the same colour flag, the game keeps going. If the couples original call was right, they win $1, and otherwise, they lose $1. To ensure that the couple cannot cheat, the two tables are places far enough apart such that each player’s decision on which flag to raise is space-like separated. Specifically the tables are placed 179 875 475 km apart and each player has 1 minute to decide which flag to raise otherwise a black flag will be raised on their behalf (or, more realistically, the tables are pla[...]2015-11-16T04:20:26Z
It is time to announce the results from Stochastic Elections Canada for the 42st General Election.Every vote counts with the stochastic election process, so we had to wait until all election results were validated and certified before we could announce our results. However, stochastic election results are not very sensitive to small changes to the number of votes counted. The distributions for each candidate are typically only slightly adjusted.Now that the last judicial recount has been completed, we can announce our MP selection. 2015 Stochastic Election Simulation Results Party Seats Seat Percentage Vote Percentage Liberal 139 41.1% 39.5% Conservative 105 31.1% 31.9% NDP-New Democratic Party 63 18.6% 19.7% Bloc Québécois 19 5.62% 4.66% Green Party 11 3.25% 3.45% Christian Heritage Party 1 0.296% 0.0870% Results by province and by riding are available (electoral districts on page 2).The results were generated from Elections Canada data. One hundred and seventy-four elected candidates differ from the actual 2015 election outcome. The Christian Heritage Party holds the balance of power in this parliament. Assuming a Liberal party member becomes speaker of the house, that means the Liberals together with the Bloc Québécois and Green Party have 168 votes and the Conservative and NDP together have 168 votes. In this case, it is the Christian Heritage Party vote that would break the tie. Unfortunately, Liberal leader Justin Trudeau with 52.0% of the vote in the riding of Papineau, still lost to Maxime Claveau of the Bloc Québécois with 12.2% of the vote. Apparently it is now the Bloc Québécois’s turn to represent their support in Papineau. If Justin Trudeau wants to be prime minister, his best bet is to try to be appointed to the Senate and rule from there. Similarly NDP leader Tom Mulcair lost to Liberal candidate Rachel Bendayan in the riding of Outremont. Perhaps there is a deal to be struck between the Liberal and NDP to get their leaders appointed to the Senate. This is only one example of the results of a stochastic election. Because of the stochastic nature of the election process, actual results may differ. In Canada’s election process, it is sometimes advantageous to not vote for one’s preferred candidate. The stochastic election system is the only system in which it always best to vote for your preferred candidate. Therefore, if the 2015 election were actually using a stochastic election system, people would be allowed to vote for their true preferences. The outcome could be somewhat different than what this simulation illustrates.Related info 2015 stochastic election expected results 2011 stochastic election results 2008 stochastic election results 2006 stochastic election results 2004 stochastic election results [...]2015-10-21T01:56:05Z
The rule of the people has the fairest name of all, isonomia, and does none of the things that a monarch does. The lot determines offices, power is held accountable, and deliberation is conducted in public. — Herodotus In Athenian democracy, sortition was used to select their magistrates in order to avoid the oligarchs buying their way into the office. What would happen if we used a form of sortition to to select our parliament? Since most people are too busy and unprepared to sit in parliament, I propose the next best thing: the drawing of lots in a riding to select a person to chose the representative for the riding. What would happen? The resulting system is a unique system that provides local representation and approximately proportional representation. Each party gets a chance to represent a riding in roughly proportion to the amount of support they have in the riding. Democracy means “rule of people”, not “rule of the majority” (nor “rule of the plurality”). Not only is it perfectly democratic for the minority to get an opportunity to be represented in parliament, it is more democratic than what we have in Canada now. Of course, directly selecting a random person in a riding is fraught with difficulties, so instead one would vote, as we do now, for one’s preferred candidate. Then, once the votes are tallied, a candidate is selected randomly with probability proportional to the vote they received. In this system it is always best to vote for your preferred candidate. There will be no more strategic votes or vote splitting. Voting participation would go up since every vote increases the chances of your preferred candidate being selected. The resulting parliament will be close to the proportion of the number of votes received for each party without having MPs selected from a party list. Imagine a world where we have Stochastic Elections Canada. Stochastic Election law requires that all counts be validated and recounted, if requested, before seat selection takes place. Because in every vote influences the outcome, we must await the return of the writs, scheduled by electoral law for Monday, November 9, 2015. For now, we can bring you our seat expectation chart based on preliminary 2015 election results: Expected Seat Distribution Party Expected Number of Seats(95% confidence) Distribution Shape Animal Alliance/Environment Voters 0 – 1 ATN 0 Bloc Québécois 9 – 22 Canada Party 0 CAP 0 Christian Heritage Party 0 – 2 Communist 0 – 1 Conservative 91 – 122 Democratic Advancement 0 – 1 Forces et Démocratie - Allier les forces de nos régions 0 – 1 Green Party 5 – 18 Liberal 119 – 153 Libertarian 0 – 3 Marxist-Leninist 0 – 1 NDP-New Democratic Party 54 – 81 PACT 0 PC Party 0 – 1 Pirate 0 Radical Marijuana 0 – 1 Rhinoceros 0 – 1 Seniors Party 0 The Bridge 0 United Party 0 Independent 0 – 3 No Affiliation 0 – 1 Related info 2011 stochastic election results 2008 stochastic election results 2006 stochastic election results 2004 stochastic election results [...]2015-08-27T02:03:14Z
I am a big fan of E. T. Jaynes. His book Probability Theory: The Logic of Science is the only book on statistics that I ever felt I could understand. Therefore, when he appears to rail against the conclusions of Bell’s theorem in his paper “Clearing up Mysteries—The Original Goal”, I take him seriously. He suggests that perhaps there could be a time-dependent hidden variable theory that could yield the outcomes that quantum mechanics predicts.However, after reading Richard D. Gill’s paper, “Time, Finite Statistics, and Bell’s Fifth Position” it is very clear that there can be nothing like a classical explanation that yields quantum predictions, time-dependent or otherwise. In this paper Gill reintroduces Steve Gull’s computer network, where a pair of classical computers is tasked to recreate probabilities predicted in a Bell-CHSH delayed choice experiment. The catch is that the challenger gets to choose the stream of bits sent to each of the two spatially separated computers in the network. These bits represent the free choice an experimenter running a Bell-CHSH experiment has to choose which polarization measurements to make. No matter what the classical computer does, no matter how much time-dependent fiddling you want to do, it can never produce correlations that will violate the Bell-CHSH inequality in the long run. This is Gull’s “You can’t program two independently running computers to emulate the EPR experiment” theorem.Gill presents a nice analogy with playing roulette in the casino. Because of the rules of roulette, there is no computer algorithm can implement a strategy that will beat the house in roulette in the long run. Gill goes on to quantify exactly how long the long run is in order to place a wager against other people who claim they can recreate the probabilities predicted by quantum mechanics using a classical local hidden variable theory. Using the theory of supermartingales, one can bound the likelihood of seeing the Bell-CHSH inequality violated by chance by any classical algorithm in the same way that one can bound the likelihood of long winning streaks in roulette games.I liked the casino analogy so much that I decided to rephrase Gull’s computer network as a coin guessing casino game I call Bell’s Casino. We can prove that any classical strategy, time-dependent or otherwise, simply cannot beat the house at that particular game in the long run. Yet, there is a strategy where the players employ entangled qubits and beat the house on average. This implies there cannot be any classical phenomena that yields quantum outcomes. Even if one proposes some classical oscollating (time-dependent) hidden variable vibrating at such a high rate that we could never practically measure it, this theory still could not yield quantum probabilities, because such a theory implies we could simulate it with Gull’s computer network. Even if our computer simulation was impractically slow, we could still, in principle, deploy it against Bell’s Casino to beat their coin game. But no such computer algorithm exists, in exactly the same way that there is no computer algorithm that will beat a casino at a fair game of roulette. The fact that we can beat the casino by using qubits clearly proves that qubits and quantum physics is something truly different.You may have heard the saying that “correlation does not imply causation”. The idea is that if outcomes A and B are correlated, the either A causes B, or B causes A, or there is some other C that causes A and B. However, in quantum physics there is a fourth possibilty. We can have correlation without causation.In light of Gull and Gill’s iron clad argument, I went back to reread Jaynes’s “Clearing up Mysteries”. I wanted to understand how Jaynes could have been so mistaken. After[...]2015-08-16T18:56:21Z
A new casino has opened up in town named “Bell’s Casino”. They are offering a coin game. The game works as follows.
The house will commit two coins on the table, oriented heads or tails each, and keep them covered. The player calls what the faces of the each of the coins are, either HH, HT, TH, or TT. The casino reveals the coins and if the player is correct, they win $1, and otherwise they lose $1.
After opening the customers stop coming by to play this boring game, so to boost attendance the casino modifies the game as follows.
The house will commit two coins on the table, oriented heads or tails each, and keep them covered. The player calls what the faces of each of the two coins are, either HH, HT, TH, or TT. The casino reveals one coin, of the players choice. After seeing revealed coin, the player can elect to back out of the game and neither win nor lose, or keep going, and see the second coin. If the player’s call is correct, they win $1, and otherwise they lose $1.
Even with the new, more fair, game, attendance at the casino starts dropping off again. The casino decides to offer a couples game.
The house will commit two coins on two tables, oriented heads or tails each, and keep them covered. The couple, together, calls what the the faces of each of the two coins are, either HH, HT, TH, or TT. Then, each player in the couple gets to see one coin each. Collectively they get to decide whether they are going to back out of the game or not by the following method. After seeing their revealed coin, each player will raise either a black flag or a red flag. If both players raise the different colour flags, the game ends and no one wins or loses. If both players raise the same colour flag, the game keeps going. If the couples original call was right, they win $1, and otherwise, they lose $1. To ensure that the couple cannot cheat, the two tables are places far enough apart such that each player’s decision on which flag to raise is space-like separated. Specifically the tables are placed 179 875 475 km apart and each player has 1 minute to decide which flag to raise otherwise a black flag will be raised on their behalf (or, more realistically, the tables are placed 400 m apart and each player has 100 nanoseconds to decide which flag to raise).
The casino cannot figure out how they keep losing money on this game and, soon, Bell’s Casino goes bankrupt.
2015-02-22T23:31:25Z
There is much written about the duality between strict-order (call-by-value) evalutaion for the lambda calculus and the normal-order (call-by-need) evaluation (or semantic equivently, lazy evaluation). In the simply typed lambda calculus, all evaluation eventually terminates, so both evaluation strategies result in the same values. However, when general recursion is added to the simply typed lambda calculus (via a fixpoint operator, for example) then evaluation of some expressions does not terminate. More expressions terminate with normal-order evaluation than with strict-order evaluation. In fact, if evaluation terminates in any order, then it terminates with normal-order evaluation. I would like to discuss the possibility of a third, even laxer evaluation strategy available for the typed lambda calculus that allows for even more expressions to terminate. I did just say that normal-order evaluation is, in some sense, a best possible evaluation order, so, in order to beat it, we will be adding more redexes that add the commuting conversions. The typed lambda calculus enjoys certain commuting conversions for case expressions that allow every elimination term to pass through the case expression. For example, the commuting conversion for the π₁ elimination term and the case experssion says that π₁(case e₀ of σ₁ x. e₁ | σ₂ y. e₂) converts to case e₀ of σ₁ x. π₁(e₁) | σ₂ y. π₁(e₂) These commuting conversions are required so that the subformula property holds. My understanding is that a corollary of this says that f(case e₀ of σ₁ x. e₁ | σ₂ y. e₂) and case e₀ of σ₁ x. f(e₁) | σ₂ y. f(e₂) are denotationally equivalent whenever f is a strict function. I would like to develop a version of the lambda calculus that allows these two expressions to denote the same value for any f. Call this, the unrestricted commuting conversion property. A lambda calculus with this property would necessarily be parallel and thus will require a parallel evaluation strategy. For example, the natural definition of or becomes the parallel-or operation. or x y := if x then True else y This definition has the usual short-circuit property that or True ⊥ is True where ⊥ is defined by ⊥ := fix id If we use the unrestricted commuting conversion property then we also have the that or ⊥ True is True: or ⊥ True = {definition of or} if ⊥ then True else True = {β-expansion} if ⊥ then const True 〈〉 else const True 〈〉 = {commuting} const True (if ⊥ then 〈〉 else 〈〉) = {β-reduction} True Hence or is parallel-or. Other parallel functions, such as the majority function, also follow from their natural definitions. maj x y z := if x then (or y z) else (and y z) In this case maj ⊥ True True is True. maj ⊥ True True = {definition of maj} if ⊥ then (or True True) else (and True True) = {evaluation of (or True True) and (and True True) if ⊥ then True else True = {commuting} True It is easy to verify that maj True ⊥ True and maj True True ⊥ are also both True. My big question is whether we can devise some nice operational semantics for the lambda calculus that will have the unrestricted commuting conversions property that I desire. Below I document my first attempt at such operational semantics, but, spoiler alert, it does not work. The us[...]2015-01-11T04:05:37Z
Recently I have been working on purging DSA from my computer systems. The problem with DSA and ECDSA is that they fail catastrophically with when nonces are accidentally reused, or if the randomly generated nonces are biased. At about the same time, I was pleased to discover an article on securing SSH. It gives further advice in setting up SSH and I have proceeded to apply most of the recommendation listed there. For key exchange algorithms, the article suggests using curve25519-sha256 and falling back to diffie-hellman-group-exchange-sha256 for compatibility purposes if you must. The diffie-hellman-group-exchange-sha256 protocol allows the client and server to negotiate a prime field to perform the key exchange in. In order to avoid using the smaller prime fields, the article suggests deleting prime numbers less than 2000 bits in size from /etc/ssh/moduli. The problem with this advice is that only the SSH server reads /etc/ssh/moduli; touching this file does nothing to secure your SSH client from using small prime fields during key negotiation. Securing the client is the important use case for diffie-hellman-group-exchange-sha256, because if you can control the server, then it means you will probably use curve25519-sha256 instead. However, the protocol for diffie-hellman-group-exchange-sha256 does allow the client to negotiate the field side. The problem is that this ability is not exposed for configuration in SSH. To address this, I created a patch for OpenSSH that raises the minimum field size allowed for the diffie-hellman-group-exchange-sha256 key exchange for both the client and server. This means you do not need to edit the /etc/ssh/moduli file to increase the minimum field size for the server, but it will not hurt to do so either. If you are running NixOS you can download the patch and add it to your /etc/nixos/configuration.nix file with the following attribute. nixpkgs.config.packageOverrides = oldpkgs: { openssh = pkgs.lib.overrideDerivation oldpkgs.openssh (oldAttrs: { patches = oldAttrs.patches ++ [ ./openssh-dh-grp-min.patch ]; }); }; As an aside, I noticed that this key exchange protocol has a design flaw in it. The hash signed by the server is the hash of V_C || V_S || I_C || I_S || K_S || min || n || max || p || g || e || f || K. The details of what those variables stand for is not important. What is important is that there is a older format of the protocol that is supported for backwards compatibility where the hash signed by the server is the hash of V_C || V_S || I_C || I_S || K_S || n || p || g || e || f || K. In this older protocol, the client only requests a field size without specifying the minimum and maximum allowed bounds. This is why the variables min and max are do not appear in the hash of the older protocol. A short header is sent by the client to determine which of these two versions of this protocol it is using. The problem is that this header is not part of the hashed data. This little crack has potential to be an exploit. A MITM attacker could replace the header the client sends with the old protocol header, and then try to manipulate the remaining communication between the client and server so that both the client and server hash the same serialized byte string allowing the server to appear to be authenticated to the client, but where the client and server are interpreting that serialized byte string in two different ways. In particular the MITM wants the client to not be doing computation modulo some safe prime, but instead do modular arithmetic over a different ring entirely. Fortunately this particular little crack does not appear to be wide enough to exploit. The incidental properties of the serialization format do not allow a successful manipulation, at least not in practical SSH configurations. When one is sig[...]