Subscribe: Ned Batchelder's blog
http://www.nedbatchelder.com/blog/rss.xml
Added By: Feedage Forager Feedage Grade A rated
Language: English
Tags:
beginners  def  experts  fibonacci numbers  fibonacci  frustration  good  line  match  numbers  pairs  python  rule  rules  triangular 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Ned Batchelder's blog

Ned Batchelder's blog



Ned Batchelder's personal blog.



 



How code slows as data grows

2017-10-18T05:43:20-05:00

One of the parts of the vacation talk I did in September at Boston Python was about big-O notation. I've noticed that topic seems to be some kind of dividing line for people who feel insecure about not having a computer science degree. I wanted to explain it in simple practical terms so that people could understand it well enough to inform their choices during everyday coding.

I liked how it came out, so I wrote it up as a standalone piece: Big-O: how code slows as data grows.




Beginners and experts

2017-09-23T10:13:00-05:00

I gave a talk at Boston Python the other night. It started as an exposition of the point matching algorithm I've previously written about on this blog. But as I thought about my side project more, I was interested to talk about the challenges I faced while building it. Not because the specifics were so interesting, but because they were typical problems that all software projects face.

And in particular, I wanted to underscore this point: software is hard, even for experts. Experts have the same struggles that beginners do.

I used this tweet as an illustration:

(image)

I love the raw emotion on the two boys' faces. They perfectly illustrate both the frustration and exhilaration of writing software.

But here's what beginners might not understand: beginners think beginners feel the frustration, and experts feel the exhilaration. As any expert will tell you, experts feel plenty of frustration. They feel like that left-hand kid a lot.

The difference between beginners and experts is that experts are familiar with that frustration. They encounter it all the time, as they deal with new unfamiliar technologies, or a thorny bug, or just when they are too tired to tackle the problem before them. They know the frustration is because they are facing a challenging problem. They know it isn't a reflection of their self-worth or abilities. Experts know the feeling will pass, and they have techniques for dealing with the frustration.

When beginners get frustrated, they can start to worry that they are not cut out for software, or they are dumb, or that everyone else gets it and they don't.

The good news for beginners is: this isn't about you. Software is difficult. We build layer upon layer of leaky abstractions, of higher and higher realms of virtualization and complexity. There's no natural limit to how high our towers of complexity can go, so we are forced to try to understand them, or tame them, and it's very hard. Our languages and tools are obscure and different and new ones are invented all the time. Frustration is inevitable.

The bad news for beginners is: this feeling won't stop. If you do your software career right, then you will always be a newb at something. Sure, you can master a specialty, and stick with it, but that can get boring. And that specialty might dwindle away leaving you stranded.

You will be learning new things forever. That feeling of frustration is you learning a new thing. Get used to it.




New backups: Arq to Wasabi

2017-08-27T09:25:46-05:00

This week CrashPlan announced they were ending consumer services, so I had to replace it with something else. Backups are one of those things at the unpleasant intersection of tedious, difficult, and important.

A quick spin around the latest alternatives showed the usual spectrum of possibilities, ranging from perl hackers implementing rsync themselves, to slick consumer tools. I need to have something working well not just on my computer, but others in my family, so I went the consumerish route.

Arq backing up to Wasabi seems like a good choice for polish and price.

One thing I always struggle with: how to ensure my stuff is backed up, without needlessly copying around all the crap that ends up in my home directory that I don't need backed up. On a Mac, the ~/Library directory has all sorts of stuff that I think I don't need to copy around. Do I need these?:

  • Library/Application Support
  • Library/Caches
  • Library/Containers

I add these directories to the exclusions. Should my Dropbox folder get backed up? Isn't that what Dropbox is already doing?

Then as a developer, there's tons more to exclude. Running VirtualBox? You have have a 10Gb disk image somewhere under your home. I have something like 20,000 .pyc files. The .tox directory for coverage.py is 350Mb.

So I also exclude these:

  • .git
  • .hg
  • .svn
  • .tox
  • node_modules
  • .local
  • .npm
  • .vagrant.d
  • .vmdk
  • .bundle
  • .cache
  • .heroku
  • .rbenv
  • .gem
  • *.pyc
  • *.pyo
  • *$py.class

Of course, as a native Mac app for consumers, Arq doesn't provide a way that I can supply all these once, I have to fiddle with GUI + and - buttons, and enter them one at a time...

Lastly, some files don't seem comfortable with backups. Thunderbird's storage files are large, and while Arq copies only certain byte ranges, they still amount to about 300Mb each time. Should I even back up my email? Should I still be using Thunderbird? Too many uncertainties....




Coverage.py podcast

2017-08-06T09:15:30-05:00

I was a guest on the Podcast.__init__ podcast this week: Coverage.py with Ned Batchelder. We talk about coverage.py, how I got started on it, why it's good, why it's not good, how it works, and so on:

And in the unlikely case that you want yet more of my dulcet tones, I was also on the Python Test podcast, mentioned in this blog post: The Value of Unit Tests.




Look around you

2017-07-23T09:58:30-05:00

I've been trying my Instagram experiment for a year now. I've really liked doing it: it gives me a prompt for looking around me and seeing what there is to see. One of the things that surprised me when I looked back is how many pictures I took in a very small area, one that I would have thought of as uninteresting: the few blocks between where I swim in the morning, and where I work. Of the 197 pictures I took in the last year, 38 of them are from that neighborhood: I'm not saying these are all masterpieces. But I wouldn't have thought to take them at all if I hadn't been explicitly looking for something interesting to shoot.Look around you: not much is truly uninteresting. [...]



Finding fuzzy floats

2017-07-09T15:46:43-05:00

For a 2D geometry project I needed to find things based on 2D points. Conceptually, I wanted to have a dict that used pairs of floats as the keys. This would work, except that floats are inexact, and so have difficulty with equality checking. The "same" point might be computed in two different ways, giving slightly different values, but I want them to match each other in this dictionary.I found a solution, though I'm surprised I didn't find it described elsewhere.The challenges have nothing to do with the two-dimensional aspect, and everything to do with using floats, so for the purposes of explaining, I'll simplify the problem to one-dimensional points. I'll have a class with a single float in it to represent my points.First let's look at what happens with a simple dict:>>> from collections import namedtuple >>> Pt = namedtuple("Pt", "x") >>> >>> d = {} >>> d[Pt(1.0)] = "hello" >>> d[Pt(1.0)] 'hello' >>> d[Pt(1.000000000001)] Traceback (most recent call last):   File "", line 1, in  KeyError: Pt(x=1.000000000001) As long as our floats are precisely equal, the dict works fine. But as soon as we get two floats that are meant to represent the same value, but are actually slightly unequal, then the dict is useless to us. So a simple dict is no good.To get the program running at all, I used a dead-simple structure that I knew would work: a list of key/value pairs. By defining __eq__ on my Pt class to compare with some fuzz, I could find matches that were slightly unequal:>>> class Pt(namedtuple("Pt", "x")): ...     def __eq__(self, other): ...         return math.isclose(self.x, other.x) ... >>> def get_match(pairs, pt): ...     for pt2, val in pairs: ...         if pt2 == pt: ...             return val ...     return None ... >>> pairs = [ ...     (Pt(1.0), "hello"), ...     (Pt(2.0), "goodbye"), ... ] >>> >>> get_match(pairs, Pt(2.0)) 'goodbye' >>> get_match(pairs, Pt(2.000000000001)) 'goodbye' This works, because now we are using an inexact closeness test to find the match. But we have an O(n) algorithm, which isn't great. Also, there's no way to define __hash__ to match that __eq__ test, so our points are no longer hashable.Trying to make things near each other be equal naturally brings rounding to mind. Maybe that could work? Let's define __eq__ and __hash__ based on rounding the value:>>> class Pt(namedtuple("Pt", "x")): ...     def __eq__(self, other): ...         return round(self.x, 6) == round(other.x, 6) ...     def __hash__(self): ...         return hash(round(self.x, 6)) ... >>> d = {} >>> d[Pt(1.0)] = "apple" >>> d[Pt(1.0)] 'apple' >>> d[Pt(1.00000001)] 'apple' Nice! We have matches based on va[...]



Triangular Fibonacci numbers

2017-06-17T11:56:40-05:00

Yesterday in my post about 55, I repeated Wikipedia's claim that 55 is the largest number that is both triangular and in the Fibonacci sequence. Chris Emerson commented to ask for a proof. After a moment's thought, I realized I had no idea how to prove it.The proof is in On Triangular Fibonacci Numbers, a dense 10-page excursion into number theory I don't understand.While I couldn't follow the proof, I can partially test the claim empirically, which leads to fun with Python and itertools, something which is much more in my wheelhouse.I started by defining generators for triangular numbers and Fibonacci numbers:def tri():     """Generate an infinite sequence of triangular numbers."""     n = 0     for i in itertools.count(start=1):         n += i         yield n          print(list(itertools.islice(tri(), 50))) [1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105, 120, 136, 153, 171, 190, 210, 231, 253, 276, 300, 325, 351, 378, 406, 435, 465, 496, 528, 561, 595, 630, 666, 703, 741, 780, 820, 861, 903, 946, 990, 1035, 1081, 1128, 1176, 1225, 1275]def fib():     """Generate an infinite sequence of Fibonacci numbers."""     a, b = 1, 1     while True:         yield a         b, a = a, a+b          print(list(itertools.islice(fib(), 50))) [1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584,  4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811,  514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352,  24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437,  701408733, 1134903170, 1836311903, 2971215073, 4807526976, 7778742049,  12586269025, 20365011074] The Fibonacci sequence grows much faster!My first impulse was to make two sets of the numbers in the sequences, and intersect them, but building a very large set took too long. So instead I wrote a function that took advantage of the ever-increasing nature of the sequences to look for equal elements in two monotonic sequences:def find_same(s1, s2):     """Find equal elements in two monotonic sequences."""     try:         i1, i2 = iter(s1), iter(s2)         n1, n2 = next(i1), next(i2)         while True:             while n1 < n2:                 n1 = next(i1)             if n1 == n2:                 yield n1                 n1 [...]



Math factoid of the day: 55

2017-06-16T06:33:00-05:00

55 is in the Fibonacci sequence:

1 1 2 3 5 8 13 21 34 55 ...

55 is a triangular number:

1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55

It is the largest number that is both Fibonacci and triangular.

It is also a Kaprekar number:

55² = 3025 and 30+25 = 55




Re-ruling .rst

2017-05-12T07:26:34-05:00

Sometimes, you need a small job done, and you can write a small Python program, and it does just what you need, and it pleases you. I have some Markdown files to convert to ReStructured Text. Pandoc does a really good job. But it chooses a different order for heading punctuation than our house style, and I didn't see a way to control it.But it was easy to write a small thing to do the small thing:import re import sys # The order we want our heading rules. GOOD_RULES = '#*=-.~' # A rule is any line of all the same non-word character, 3 or more. RULE_RX = r"^([^\w\d])\1\1+$" def rerule_file(f):     rules = {}     for line in f:         line = line.rstrip()         rule_m = re.search(RULE_RX, line)         if rule_m:             if line[0] not in rules:                 rules[line[0]] = GOOD_RULES[len(rules)]             line = rules[line[0]] * len(line)         print(line) rerule_file(sys.stdin) If you aren't conversant in .rst: there's no fixed order to which punctuation means which level heading. The first rule encountered is heading 1, the next style found is heading 2, and so on.There might be other ways to do this, but this makes me happy. [...]



Shell = Maybe

2017-04-24T10:38:57-05:00

A common help Python question: how do I get Python to run this complicated command line program? Often, the answer involves details of how shells work. I tried my hand at explaining it what a shell does, why you want to avoid them, how to avoid them from Python, and why you might want to use one: Shell = Maybe.