Subscribe: Ned Batchelder's blog
Added By: Feedage Forager Feedage Grade B rated
Language: English
beginners  candy  code  don  experts  frustration  good  needed  new  people  python  things  time  toxic experts  toxic  values  work 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Ned Batchelder's blog

Ned Batchelder's blog

Ned Batchelder's personal blog.


Iter-tools for puzzles: oddity


It’s December, which means Advent of Code is running again. It provides a new two-part puzzle every day until Christmas. They are a lot of fun, and usually are algorithmic in nature.One of the things I like about the puzzles is they often lend themselves to writing unusual but general-purpose helpers. As I have said before, abstraction of iteration is a powerful and under-used feature of Python, so I enjoy exploring it when the opportunity arises.For yesterday’s puzzle I needed to find the one unusual value in an otherwise uniform list. This is the kind of thing that might be in itertools if itertools had about ten times more functions than it does now. Here was my definition of the needed function:def oddity(iterable, key=None):     """     Find the element that is different.     The iterable has at most one element different than the others. If a     `key` function is provided, it is a function used to extract a comparison     key from each element, otherwise the elements themselves are compared.     Two values are returned: the common comparison key, and the different     element.     If all of the elements are equal, then the returned different element is     None.  If there is more than one different element, an error is raised.     """ The challenge I set for myself was to implement this function in as general and useful a way as possible. The iterable might not be a list, it could be a generator, or some other iterable. There are edge cases to consider, like if there are more than two different values.If you want to take a look, My code is on GitHub (with tests, natch.) Fair warning: that repo has my solutions to all of the Advent of Code problems so far this year.One problem with my implementation: it stores all the values from the iterable. For the actual Advent of Code puzzle, that was fine, it only had to deal with less than 10 values. But how would you change the code so that it didn’t store them all?My code also assumes that the comparison values are hashable. What if you didn’t want to require that?Suppose the iterable could be infinite? This changes the definition somewhat. You can’t detect the case of all the values being the same, since there’s no such thing as “all” the values. And you can’t detect having more than two distinct values, since you’d have to read values forever on the possibility that it might happen. How would you change the code to handle infinite iterables?These are the kind of considerations you have to take into account to write truly general-purpose itertools functions. It’s an interesting programming exercise to work through each version would differ.BTW: it might be that there is a way to implement my oddity function with a clever combination of things already in itertools. If so, let me know! [...]

Bite-sized command line tools: pylintdb


One of the things I love about Python is the abundance of handy libraries to cobble together small but useful tools. At work we had a large pylint report, and I wanted to understand it better. In particular, I wanted to trace back to which commit had introduced the violations. I wrote to do the work.

Since we had a lot of violations (>5000!) I figured it would take some time to use git blame to find the commit for each line. I wanted a way to persist the progress through the lines. SQLite seemed like a good choice. It also would give me ad-hoc queryability, though to be honest, I didn’t even consider that at the time.

SQLite is part of the Python standard library, but there’s a third-party library that makes it super-convenient to use. Dataset lets you use a database without creating a schema or even model first. You just open a database, choose a table name, and then start writing dictionaries to it. It handles all the schema creation (or modification!) behind the scenes. Awesome.

These days, click is the tool of choice for command-line parsing, and other chores needed in the terminal. I used the progress bar functions. They aren’t perfect, but in only a few lines I had a workable indicator.

Other useful things from the Python standard library:

  • concurrent.futures for parallelizing the git blame work. It’s got a high-level “map” interface that did exactly what I needed without having to think about queues, threads, and so on.
  • subprocess.check_output does the subprocess thing people usually want: just run the command and give me the output.

pylintdb isn’t earth-shattering, it just does exactly what I needed in 120 lines with a minimum of fuss, thanks to dataset, click, and Python.

New design


I’ve just updated the design of this site. The goal was to make it responsive, so that it would work well on small screens, but I made other changes along the way. The body type is now serif rather than sans, and much larger. I made lots of other tweaks as I worked on pages.

Making a responsive design was fun: it meant working out mechanisms for the layout rather than just a static design.

Of course, it’s easy to get carried away. Take a look at what happens to my name in the header when the screen gets below 300 pixels: Ned Batchelder becomes nedbat to save space. This was accomplished with the help of a span with class “chelder”...

It took me a long time to make this design. I started it 15 months ago, but stopped work on it for more than a year. I picked it up again two weeks ago, and powered through the remaining work.

Behind the scenes, I changed only one thing: using Sass to generate the CSS. The rest is still as janky and difficult as always.

For comparison (and posterity), here is the design I just replaced. If anything seems amiss with the new design, just let me know.

Candy in my pocket


Let me tell you a modern-day horror story: for almost ten days, I didn’t have a phone!

I was on a day trip to Montréal, and my phone just completely died. I thought maybe it just needed a charge, but nope, nothing would bring it back. I had a nicely sculpted chunk of glass.

(Side note: I had been texting with Susan, so eventually I dashed off a quick email: “My phone is completely dead. I can’t even tell what time it is.” She sent back an email that said just, “It’s 11:24.” Is it any wonder I love her?)

At first, I felt a bit lost. I couldn’t take pictures, I couldn’t use maps, I couldn’t text with Susan to coordinate getting picked up at the airport.

But what I noticed is that much of what I was used to doing with my phone, I didn’t really miss. I didn’t have games to jump to when I had a free moment. I wasn’t able to reflexively look up interesting tidbits. I couldn’t anxiously check if I had gotten an interesting email.

I realized I didn’t need those things. It was OK to not have a smaller screen to use when I wasn’t using my larger screen. I started to feel like the phone was like a bag of candy in my pocket. If my pocket is full of candy, I’ll nibble on candy all day long. But when I didn’t have a bag of candy, I didn’t miss it. Sure, sometimes I’d like to have a piece of candy, but not nearly as much as I ate when I always had a bag of candy with me.

Now I finally (USPS can be disappointing...) have a new phone, a Google Pixel. I’ll be glad to have my podcasts back. I can take pictures again. I was lucky not to need a two-factor app in the last week.

I’ll have to see how I relate to it. I’ll have the candy with me, but will I be able to resist nibbling? I wasn’t as bad as some: I never had the impulse to read my phone while standing at a urinal, for example. But I nibbled a lot more than it turns out I needed to. I’ll try to keep in mind how these last ten days felt, and snack responsibly.

Finding your first OSS project


Running in the circles I do, I often hear the question, “Where’s a good open source project to start off contributing to?” The last time this came up, I asked on Twitter and got some good replies.

The best answers pointed to two aggregators of projects. These sites collect links to projects that have special labels for bug reports that are good for first-time contributors to work on. The presence of these labels is a good indicator that the project is well-maintained, welcoming to newcomers, and prepared for their contributions.

  • Up For Grabs lists dozens of projects, helpfully showing how many open first-timer issues each has.
  • Awesome for Beginners is lower-tech, but also lists projects with links to their first-timer tagged issues.

I also got links to some useful advice for first-time contributors:

Making a first contribution can be overwhelming. Keep looking through these resources until you find something that makes it feel do-able.

Toxic experts


I wrote Big-O: how code slows as data grows to explain Big-O notation. My goal was to explain it in broad strokes for people new to the idea. It grew out of thinking I’d been doing about beginners and experts. In the comments, there was an unexpected and unfortunate real-world lesson.

An anonymous coward named “pyon” said I should be ashamed. They pointed out a detail of algorithmic analysis that I had not mentioned. It’s a detail that I had never encountered before. I think it’s an interesting detail, but not one that needed to be included.

Pyon is an example of a toxic expert. People like this know a lot, but they use that knowledge to separate themselves from the unwashed masses of newbs. Rather than teach, they choose to sneer from their lofty perches, lamenting the state of the world around them, filled as it is with People Who Don’t Want To Learn.

The important skill pyon and other toxic experts are missing is how to connect with people. They could use their knowledge to teach, but it’s more important to them to separate themselves from others. Points of correctness are useless without points of connection.

Toxic experts care more about making distinctions between people to elevate themselves than they do about helping people. Beware: they are everywhere you look in the tech world. It’s easy to run into them when you are trying to learn. Ignore them. They don’t know you, and they don’t know what you can do.

Pyon is fixated on a particular detail of algorithmic analysis, and feels that it is essential to understanding Big-O. I can tell you is that I am doing fine in my 30-year career, and I had never heard that particular detail. My Big-O piece wasn’t meant to be exhaustive. There are entire books written about algorithmic notation. I even wrote at the end, “There’s much more to algorithm analysis if you want to get into the true computer science aspects of it, but this is enough for working developers.”

But pyon can’t see the forest for the trees. Experts have spent a lot of time and energy learning what they know. They love their knowledge. They wouldn’t have been able to get where they are without a passion for the subject. But sometimes they have a hard time seeing how people can be successful without that well-loved knowledge. They’ve lost sight of what it means to be a beginner, and what beginners need to learn.

Toxic experts will latch onto a particular skill and decide that it is essential. For them, that skill is a marker dividing Those-Who-Know from Those-Who-Don’t. These shibboleths vary from expert to expert. In the current case, it’s a detail of algorithmic analysis. I’ve seen other toxic experts insist that it’s essential to know C, or assembly language, or recursion and pointers, and so on.

I’m not saying those aren’t good things to know. The more you know, the better. Every one of these topics will be useful. But they are not essential. You can do good work without them. You certainly don’t deserve to be spat upon.

The ultimate irony is that while pyon and other toxic experts are bemoaning the state of the industry because of missing knowledge, they are highlighting the main skill gap the tech industry needs to fix: empathy.

How code slows as data grows


One of the parts of the vacation talk I did in September at Boston Python was about big-O notation. I’ve noticed that topic seems to be some kind of dividing line for people who feel insecure about not having a computer science degree. I wanted to explain it in simple practical terms so that people could understand it well enough to inform their choices during everyday coding.

I liked how it came out, so I wrote it up as a standalone piece: Big-O: how code slows as data grows.

Beginners and experts


I gave a talk at Boston Python the other night. It started as an exposition of the point matching algorithm I’ve previously written about on this blog. But as I thought about my side project more, I was interested to talk about the challenges I faced while building it. Not because the specifics were so interesting, but because they were typical problems that all software projects face.

And in particular, I wanted to underscore this point: software is hard, even for experts. Experts have the same struggles that beginners do.

I used this tweet as an illustration:


I love the raw emotion on the two boys’ faces. They perfectly illustrate both the frustration and exhilaration of writing software.

But here’s what beginners might not understand: beginners think beginners feel the frustration, and experts feel the exhilaration. As any expert will tell you, experts feel plenty of frustration. They feel like that left-hand kid a lot.

The difference between beginners and experts is that experts are familiar with that frustration. They encounter it all the time, as they deal with new unfamiliar technologies, or a thorny bug, or just when they are too tired to tackle the problem before them. They know the frustration is because they are facing a challenging problem. They know it isn’t a reflection of their self-worth or abilities. Experts know the feeling will pass, and they have techniques for dealing with the frustration.

When beginners get frustrated, they can start to worry that they are not cut out for software, or they are dumb, or that everyone else gets it and they don’t.

The good news for beginners is: this isn’t about you. Software is difficult. We build layer upon layer of leaky abstractions, of higher and higher realms of virtualization and complexity. There’s no natural limit to how high our towers of complexity can go, so we are forced to try to understand them, or tame them, and it’s very hard. Our languages and tools are obscure and different and new ones are invented all the time. Frustration is inevitable.

The bad news for beginners is: this feeling won’t stop. If you do your software career right, then you will always be a newb at something. Sure, you can master a specialty, and stick with it, but that can get boring. And that specialty might dwindle away leaving you stranded.

You will be learning new things forever. That feeling of frustration is you learning a new thing. Get used to it.

New backups: Arq to Wasabi


This week CrashPlan announced they were ending consumer services, so I had to replace it with something else. Backups are one of those things at the unpleasant intersection of tedious, difficult, and important.

A quick spin around the latest alternatives showed the usual spectrum of possibilities, ranging from perl hackers implementing rsync themselves, to slick consumer tools. I need to have something working well not just on my computer, but others in my family, so I went the consumerish route.

Arq backing up to Wasabi seems like a good choice for polish and price.

One thing I always struggle with: how to ensure my stuff is backed up, without needlessly copying around all the crap that ends up in my home directory that I don’t need backed up. On a Mac, the ~/Library directory has all sorts of stuff that I think I don’t need to copy around. Do I need these?:

  • Library/Application Support
  • Library/Caches
  • Library/Containers

I add these directories to the exclusions. Should my Dropbox folder get backed up? Isn’t that what Dropbox is already doing?

Then as a developer, there’s tons more to exclude. Running VirtualBox? You have have a 10Gb disk image somewhere under your home. I have something like 20,000 .pyc files. The .tox directory for is 350Mb.

So I also exclude these:

  • .git
  • .hg
  • .svn
  • .tox
  • node_modules
  • .local
  • .npm
  • .vagrant.d
  • .vmdk
  • .bundle
  • .cache
  • .heroku
  • .rbenv
  • .gem
  • *.pyc
  • *.pyo
  • *$py.class

Of course, as a native Mac app for consumers, Arq doesn’t provide a way that I can supply all these once, I have to fiddle with GUI + and - buttons, and enter them one at a time...

Lastly, some files don’t seem comfortable with backups. Thunderbird’s storage files are large, and while Arq copies only certain byte ranges, they still amount to about 300Mb each time. Should I even back up my email? Should I still be using Thunderbird? Too many uncertainties.... podcast


I was a guest on the Podcast.__init__ podcast this week: with Ned Batchelder. We talk about, how I got started on it, why it’s good, why it’s not good, how it works, and so on:

And in the unlikely case that you want yet more of my dulcet tones, I was also on the Python Test podcast, mentioned in this blog post: The Value of Unit Tests.