nameandnature: Giles from Buffy (Default)
Trump the Redeemer
Despite the claims of some liberals, the MAGA Right is not unchristian, but the apotheosis of a violent strain of entirely American Christianity.
(tags: usa politics religion racism Christianity)
How Core Git Developers Configure Git

(tags: git programming)


Originally posted at Name and Nature. You can comment there (where there are currently comments) or here.

nameandnature: Giles from Buffy (Default)
What is Strict Aliasing and Why do we Care? · GitHub
More type punning worries.
(tags: aliasing c programming punning)

Originally posted at Name and Nature. You can comment there (where there are currently comments) or here.

nameandnature: Giles from Buffy (Default)
The Cursed Computer Iceberg Meme
This is fun if you’re a computer geek.
(tags: history culture programming computers)

Originally posted at Name and Nature. You can comment there (where there are currently comments) or here.

nameandnature: Giles from Buffy (Default)
Using Vim for C++ development

(tags: vim programming c++)

dreamwidth as vindication of a few cherished theories
DW’s co-creator on how to make a successful open source project. Via brainwane.
(tags: dreamwidth open-source)
Sync Any Folder to OneDrive in Windows 10 | Tutorials
Make a symlink from the OneDrive folder to the thing you want to sync.
(tags: onedrive backup)

Originally posted at Name and Nature. You can comment there (where there are currently comments) or here.

nameandnature: Giles from Buffy (Default)
Principles for the Application of Human Intelligence – Behavioral Scientist
“Before humans become the standard way in which we make decisions, we need to consider the risks and ensure implementation of human decision-making systems does not cause widespread harm.”
(tags: artificial-intelligence ai psychology parody)
A Decade of Vim
Some interesting looking Vim screencasts.
(tags: vim editor programming)

Originally posted at Name and Nature. You can comment there (where there are currently comments) or here.

nameandnature: Giles from Buffy (Default)
TinyPilot: Build a KVM Over IP for Under $100 · mtlynch.io
A remote keyboard and monitor with a Raspberry Pi.
(tags: kvm pi server programming)
danyspin97’s site – Colorize your CLI
More colours are good.
(tags: shell tutorial colour)
The Korean Playbook for COVID-19 (Translated) | by Indi Samarajiva | indica | Medium

(tags: covid19 epidemic medicine politics)

Age of Attention – SDr
“A leverage point in avoiding toxoplasma, is the bridge people: people who are being rewarded for taking offense, and therefore select for the worst possible behavior of the outgroup. These people act as stressors, specifically triggering ideations of worst-case-scenarios. The fix here is removing these people from your feeds/circles of influence.”
(tags: toxoplasma internet rage social-networks)

Originally posted at Name and Nature. You can comment there (where there are currently comments) or here.

nameandnature: Giles from Buffy (giles)

Occasionally I write about debugging, for the edification of others and to try to explain to muggles what I do all day. I ran into a fun one the other day.

Unicode

Joel Spolsky’s explanation of Unicode is excellent, but long. In brief: on a computer, we represent letters (“a”, “b” and so on) as numbers. Computers work with zeroes and ones, binary digits (or bits), usually in groups of 8 bits called bytes. Back in the mists of time, someone came up with ASCII, a way to represent decent American letters by giving each letter a number. All those numbers fitted a single byte (a byte can represent 256 different numbers), so one byte was one letter, and all was well… unless you weren’t American and wanted to represent funny foreign letters like “£”, or some non-Latin alphabet, or a frowning pile of poo.

The modern way of handling those foreign letters and poos is Unicode. Each different letter still has a number assigned to it, but there are a lot them, so the numbers can be bigger than you can fit in a byte. Computers still like to work in bytes, so you need to represent a letter using a sequence of one or more bytes. A way of doing this is called an encoding. One popular encoding, UTF-8, has the handy feature that all those decent American letters have the same single byte representation as they did in ASCII, but other letters get longer sequences of bytes.

The Internet

The series of tubes we call the Internet is a way of carrying bytes around. As a programmer, you often end up writing code to connect to other computers and read data. Suppose we just want to sit there forever doing something with a continuous stream of bytes the other computer is sending us1:

connection = connect_to_the_thing()

# loop forever
while True: 
    # receive up to 1024 bytes from the other computer
    bytes = connection.recv(1024)
    do_something_with(bytes)

The data that comes back from the other computer is a series of bytes. What if you know it’s UTF-8 encoded text, and you want to turn those bytes into that text?

connection = connect_to_the_thing()

# loop forever
while True: 
    # receive up to 1024 bytes from the other computer
    bytes = connection.recv(1024)
    # turn it into text
    text = bytes.decode("utf-8")
    do_something_with(text)

This seems to work fine, but very occasionally crashes on line 5 with a mysterious error message: “UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xe2 in position 1023: unexpected end of data”. Whaaat?

Some frantic Googling of “UnicodeDecodeError” turns up a bunch of people getting that error because they weren’t actually reading UTF-8 encoded text at all, but something else2. So, you check what the other side is sending, and in this case, you’re pretty sure it is sending UTF-8. Whaaat?

Squint at the error message a bit more, and you find it’s complaining about the last byte it’s read. You have to give the recv() a maximum number of bytes to read, so you picked 1024 (a handy power of 2, as is traditional). “Position 1023” is the 1024th byte received (since we start counting from 0, as is tradidional). That “0xe2” thing is hexadecimal E2, equivalent to 11100010 in binary. Read the UTF-8 stuff a bit more, and you find that 11100010 means “this letter is made up of this byte and the two more bytes following this one”. It stopped in the middle of the sequence of bytes which represent a single letter, hence the “unexpected end of data” in the error message.

At this point, if you have control over the other computer, you might be thinking up cunning schemes to ensure that what it passes to each send() is always less than 1024 bytes at a time, without breaking up a multi-byte letter. After all, the data goes out in packets, so what you get when you invoke recv() must line up with the other side’s send()s, right? Wrong.

Avian carrier

The series of tubes is narrower in some places than others, and your data may be broken up to fit. A single carrier pigeon can only carry so much weight, you see, and the RSPB is pretty strict about that sort of thing. All that’s guaranteed is that you get the bytes out in the order they went in, not how many you get out at a time.

Fortunately, Guido thought of this and blessed us with IncrementalDecoder, which knows how to remember that it was part way through a letter when it left off, so that the next time around the loop, it’ll hopefully get the rest of the bytes and give you the letter you were hoping for:

connection = connect_to_the_thing()

decoder_class = codecs.getincrementaldecoder("utf-8")
# Make a new instance of the decoder_class
decoder = decoder_class()

# loop forever
while True:
    # receive up to 1024 bytes from the other computer
    bytes = connection.recv(1024) 
    text = decoder.decode(bytes)
    do_something_with(text)

Much better! Now to raise a pull request against paramiko_expect.


  1. We’ll not worry about the other side closing the connection or the wifi packing up, for now. 

  2. I do wonder whether questions on Stack Overflow about errors from Python’s Unicode handling have more views in the aggregate than the “How do I exit Vim?” question (which is at 2.1 million views as I write this). 


Originally posted at Name and Nature. You can comment there (where there are currently comments) or here.

nameandnature: Giles from Buffy (Default)
post modern C tooling – draft 5

(tags: tools programming C)

‘My ties to England have loosened’: John le Carré on Britain, Boris and Brexit | Books | The Guardian
“At 87, le Carré is publishing his 25th novel. He talks to John Banville about our ‘dismal statesmanship’ and what he learned from his time as a spy”
(tags: spies intelligence MI5 MI6 le-carre politics)
The New Zealand Shootings: The Untold Stories | GQ
A moving account of the shootings and their aftermath. Via Metafilter.
(tags: shooting terrorism racism new-zealand)
How Derren Brown Remade Mind Reading for Skeptics | The New Yorker
Introducing Derren Brown to the Americans. Via Mefi.
(tags: magic derren-brown mentalism)
WSJ, WaPo, NYT Spread False Internet Law Claims | Cato @ Liberty
Rebutting nonsense about the supposed publisher/platform distinction in Section 230 of the US’s Communications Decency Act. From the Cato Institute, so can’t be dismissed as leftist propaganda.
(tags: law censorship internet)

Originally posted at Name and Nature. You can comment there. There are currently comments.
nameandnature: Giles from Buffy (Default)
Towards an understanding of technical debt | Kellan Elliott-McCrea
Different categories of things which all get lumped together as technical debt.
(tags: programming technical-debt software)

Originally posted at Name and Nature. You can comment there. There are currently comments.
nameandnature: Giles from Buffy (Default)
Type punning isn’t funny: Using pointers to recast in C is bad.
A common C programming technique (casting between pointers to structures) leads to problems when strict aliasing is turned on (as it is if you set -O2 -O3 in gcc).
(tags: C programming casting punning)
Type Punning, Strict Aliasing, and Optimization – Embedded in Academia
More on the type punning/aliasing business.
(tags: C punning aliasing programming)

Originally posted at Name and Nature. You can comment there. There are currently comments.
nameandnature: Giles from Buffy (Default)
10-Best-VIM-Cheat-Sheet-02.jpg (1979×1346)
A handy Vim / Vi cheatsheet
(tags: vim programming editor)

Originally posted at Name and Nature. You can comment there. There are currently comments.
nameandnature: Giles from Buffy (Default)
Deep C (and C++)
The differences between shallow and deep understanding in C/C++ or how to ace the technical interview.
(tags: programming C interview)

Originally posted at Name and Nature. You can comment there. There are currently comments.
nameandnature: Giles from Buffy (Default)
Delta Pointers: Buffer Overflow Checks Without the Checks
Using the top bytes of pointers to implement efficent out-of-bound detection.
(tags: security C pointer programming overflow)

Originally posted at Name and Nature. You can comment there. There are currently comments.
nameandnature: Giles from Buffy (Default)
Ask HN: Best way to learn modern C++? | Hacker News
Thread with book and video recommendations
(tags: c++ programming)
Epistemic extremism – UseOfReason
Contra Internet (“shoe”) atheism: I don’t need to be able to prove a thing to you before I can rationally believe it.
(tags: philosophy belief epistemology Atheism proof)
Compressing and enhancing hand-written notes
How Office Lens might do it, but open source. Introduces various colour spaces.
(tags: images python colour RGB HSV)

Originally posted at Name and Nature. You can comment there. There are currently comments.
nameandnature: Giles from Buffy (Default)
Unrolled thread from @patio11
“Some people really benefit from hearing advice that everyone knows, for the same reason we keep schools open despite every subject in them having been taught before.” Mostly related to the tech business.

In that spirit, here’s some quick Things Many People Find Too Obvious To Have Told You Already.
(tags: programming business technology)


Originally posted at Name and Nature. You can comment there. There are currently comments.
nameandnature: Giles from Buffy (Default)
What every systems programmer should know about lockless concurrency
“Seasoned programmers are familiar with concurrency building blocks like mutexes, semaphores, and condition variables. But what makes them work? How do we write concurrent code when we can’t use them, like when we’re working below the operating system in an embedded environment, or when we can’t block due to hard time constraints? And since your system transforms your code into things you didn’t write, running in orders you never asked for, how do multithreaded programs work at all? Concurrency—especially
on modern hardware—is a complicated and unintuitive topic, but let’s try to cover some fundamentals.”
(tags: concurrency programming mutex)

Originally posted at Name and Nature. You can comment there. There are currently comments.

Profile

nameandnature: Giles from Buffy (Default)
nameandnature

December 2025

S M T W T F S
 123456
78910111213
14151617181920
2122 2324252627
28293031   

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 2nd, 2026 12:30 pm
Powered by Dreamwidth Studios