<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dw="https://www.dreamwidth.org">
  <id>tag:dreamwidth.org,2014-08-04:2299717</id>
  <title>nameandnature</title>
  <subtitle>nameandnature</subtitle>
  <author>
    <name>nameandnature</name>
  </author>
  <link rel="alternate" type="text/html" href="https://nameandnature.dreamwidth.org/"/>
  <link rel="self" type="text/xml" href="https://nameandnature.dreamwidth.org/data/atom"/>
  <updated>2020-07-23T00:13:06Z</updated>
  <dw:journal username="nameandnature" type="personal"/>
  <entry>
    <id>tag:dreamwidth.org,2014-08-04:2299717:245928</id>
    <link rel="alternate" type="text/html" href="https://nameandnature.dreamwidth.org/245928.html"/>
    <link rel="self" type="text/xml" href="https://nameandnature.dreamwidth.org/data/atom/?itemid=245928"/>
    <title>Link blog: format, python, flying, programming</title>
    <published>2020-07-23T00:13:06Z</published>
    <updated>2020-07-23T00:13:06Z</updated>
    <category term="python"/>
    <category term="f 22"/>
    <category term="aviation"/>
    <category term="flying"/>
    <category term="programming"/>
    <category term="link blog"/>
    <category term="aircraft"/>
    <category term="physics"/>
    <category term="military"/>
    <category term="format"/>
    <dw:security>public</dw:security>
    <dw:reply-count>0</dw:reply-count>
    <content type="html">&lt;dl&gt;
&lt;dt&gt;&lt;a href="https://pyformat.info/"&gt;PyFormat: Using % and .format() for great good!&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;Python string formatting guide.&lt;br /&gt;&lt;small&gt;(tags: &lt;a href="http://pinboard.in/u:pw201/t:python"&gt;python&lt;/a&gt; &lt;a href="http://pinboard.in/u:pw201/t:programming"&gt;programming&lt;/a&gt; &lt;a href="http://pinboard.in/u:pw201/t:format"&gt;format&lt;/a&gt;)&lt;/small&gt;&lt;/dd&gt;
&lt;dt&gt;&lt;a href="https://www.youtube.com/watch?v=22u4qxm1YjY"&gt;MIT Private Pilot Ground School 2019, F-22 Flight Controls &amp;#8211; YouTube&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;Fascinating talk from an F-22 pilot.&lt;br /&gt;&lt;small&gt;(tags: &lt;a href="http://pinboard.in/u:pw201/t:aircraft"&gt;aircraft&lt;/a&gt; &lt;a href="http://pinboard.in/u:pw201/t:physics"&gt;physics&lt;/a&gt; &lt;a href="http://pinboard.in/u:pw201/t:F-22"&gt;F-22&lt;/a&gt; &lt;a href="http://pinboard.in/u:pw201/t:flying"&gt;flying&lt;/a&gt; &lt;a href="http://pinboard.in/u:pw201/t:aviation"&gt;aviation&lt;/a&gt; &lt;a href="http://pinboard.in/u:pw201/t:military"&gt;military&lt;/a&gt;)&lt;/small&gt;&lt;/dd&gt;
&lt;/dl&gt;
&lt;hr&gt;
&lt;p&gt;&lt;i&gt;Originally posted at &lt;a href="https://www.noctua.org.uk/blog/2020/07/23/link-blog-format-python-flying-programming/"&gt;Name and Nature&lt;/a&gt;. You can &lt;a href="https://www.noctua.org.uk/blog/2020/07/23/link-blog-format-python-flying-programming/#comments"&gt;comment there&lt;/a&gt; (where there are currently &lt;img src="https://www.noctua.org.uk/blog/wp-content/plugins/journalpress/lib/wp-lj-comments.php?post_id=178865" border="0"&gt; comments) or here.&lt;/i&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src="https://www.dreamwidth.org/tools/commentcount?user=nameandnature&amp;ditemid=245928" width="30" height="12" alt="comment count unavailable" style="vertical-align: middle;"/&gt; comments</content>
  </entry>
  <entry>
    <id>tag:dreamwidth.org,2014-08-04:2299717:243016</id>
    <link rel="alternate" type="text/html" href="https://nameandnature.dreamwidth.org/243016.html"/>
    <link rel="self" type="text/xml" href="https://nameandnature.dreamwidth.org/data/atom/?itemid=243016"/>
    <title>UnicodeDecodeError with stuff from the network</title>
    <published>2020-06-03T17:55:24Z</published>
    <updated>2020-07-15T14:39:19Z</updated>
    <category term="unicode"/>
    <category term="network"/>
    <category term="blog"/>
    <category term="python"/>
    <category term="programming"/>
    <dw:security>public</dw:security>
    <dw:reply-count>0</dw:reply-count>
    <content type="html">&lt;p&gt;Occasionally I write about debugging, for the edification of others and to try to explain to muggles what I do all day. I ran into a fun one the other day.&lt;/p&gt;



&lt;h3&gt;Unicode&lt;/h3&gt;



&lt;div class="wp-block-image"&gt;&lt;figure class="alignright is-resized"&gt;&lt;img src="https://www.jwz.org/images/2017/frowning-pile-of-poo.png" alt="" width="206" height="206" /&gt;&lt;/figure&gt;&lt;/div&gt;



&lt;p&gt;Joel Spolsky&amp;#8217;s &lt;a href="https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/"&gt;explanation of Unicode&lt;/a&gt; is excellent, but long. In brief: on a computer, we represent letters (&amp;#8220;a&amp;#8221;, &amp;#8220;b&amp;#8221; and so on) as numbers. Computers work with zeroes and ones, binary digits (or &lt;em&gt;bits&lt;/em&gt;), usually in groups of 8 bits called &lt;em&gt;bytes&lt;/em&gt;. Back in the mists of time, someone came up with &lt;a href="https://en.wikipedia.org/wiki/ASCII"&gt;ASCII&lt;/a&gt;, a way to represent decent American letters by giving each letter a number. All those numbers fitted a single byte (a byte can represent 256 different numbers), so one byte was one letter, and all was well&amp;#8230; unless you weren&amp;#8217;t American and wanted to represent funny foreign letters like &amp;#8220;£&amp;#8221;, or some non-Latin alphabet, or a &lt;a href="https://www.jwz.org/blog/2017/11/unicode-character-frowning-pile-of-poo-u1f979/"&gt;frowning pile of poo&lt;/a&gt;.&lt;/p&gt;



&lt;p&gt;The modern way of handling those foreign letters and poos is Unicode. Each different letter still has a number assigned to it, but there are a lot them, so the numbers can be bigger than you can fit in a byte. Computers still like to work in bytes, so you need to represent a letter using a sequence of one or more bytes. A way of doing this is called an &lt;em&gt;encoding&lt;/em&gt;. One popular encoding, &lt;a href="https://en.wikipedia.org/wiki/UTF-8"&gt;UTF-8&lt;/a&gt;, has the handy feature that all those decent American letters have the same single byte representation as they did in ASCII, but other letters get longer sequences of bytes.&lt;/p&gt;



&lt;h3&gt;The Internet&lt;/h3&gt;



&lt;p&gt;The &lt;a href="https://en.wikipedia.org/wiki/Series_of_tubes"&gt;series of tubes&lt;/a&gt; we call the Internet is a way of carrying bytes around. As a programmer, you often end up writing code to connect to other computers and read data. Suppose we just want to sit there forever doing something with a continuous stream of bytes the other computer is sending us&lt;sup&gt;&lt;a href="#fn1-178798" title="We’ll not worry about the other side closing the connection or the wifi packing up, for now." rel="footnote"&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;



&lt;div style="height: 250px; position:relative; margin-bottom: 50px;" class="wp-block-simple-code-block-ace"&gt;&lt;pre class="wp-block-simple-code-block-ace" data-mode="python" data-theme="monokai" data-fontsize="14" data-lines="Infinity" data-showlines="true" data-copy="false"&gt;connection = connect_to_the_thing()

# loop forever
while True: 
    # receive up to 1024 bytes from the other computer
    bytes = connection.recv(1024)
    do_something_with(bytes)&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The data that comes back from the other computer is a series of bytes. What if you know it&amp;#8217;s UTF-8 encoded text, and you want to turn those bytes into that text?&lt;/p&gt;



&lt;div style="height: 250px; position:relative; margin-bottom: 50px;" class="wp-block-simple-code-block-ace"&gt;&lt;pre class="wp-block-simple-code-block-ace" data-mode="python" data-theme="monokai" data-fontsize="14" data-lines="Infinity" data-showlines="true" data-copy="false"&gt;connection = connect_to_the_thing()

# loop forever
while True: 
    # receive up to 1024 bytes from the other computer
    bytes = connection.recv(1024)
    # turn it into text
    text = bytes.decode("utf-8")
    do_something_with(text)&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This seems to work fine, but very occasionally crashes on line 5 with a mysterious error message: &amp;#8220;UnicodeDecodeError: &amp;#8216;utf-8&amp;#8217; codec can&amp;#8217;t decode byte 0xe2 in position 1023: unexpected end of data&amp;#8221;. Whaaat?&lt;/p&gt;



&lt;p&gt;Some frantic Googling of &amp;#8220;UnicodeDecodeError&amp;#8221; turns up a bunch of people getting that error because they weren&amp;#8217;t actually reading UTF-8 encoded text at all, but something else&lt;sup&gt;&lt;a href="#fn2-178798" title="I do wonder whether questions on Stack Overflow about errors from Python’s Unicode handling have more views in the aggregate than the “&amp;lt;a href=&amp;quot;https://stackoverflow.com/questions/11828270/how-do-i-exit-the-vim-editor&amp;quot;&amp;gt;How do I exit Vim&amp;lt;/a&amp;gt;?” question (which is at 2.1 million views as I write this)." rel="footnote"&gt;2&lt;/a&gt;&lt;/sup&gt;. So, you check what the other side is sending, and in this case, you&amp;#8217;re pretty sure it &lt;em&gt;is&lt;/em&gt; sending UTF-8. Whaaat?&lt;/p&gt;



&lt;p&gt;Squint at the error message a bit more, and you find it&amp;#8217;s complaining about the last byte it&amp;#8217;s read. You have to &lt;a href="https://docs.python.org/3/library/socket.html#socket.socket.recv"&gt;give the &lt;code&gt;recv()&lt;/code&gt; a maximum number of bytes to read&lt;/a&gt;, so you picked 1024 (a handy power of 2, as is traditional). &amp;#8220;Position 1023&amp;#8221; is the 1024th byte received (since we start counting from 0, as is tradidional). That &amp;#8220;0xe2&amp;#8221; thing is &lt;a href="https://www.mathsisfun.com/hexadecimals.html"&gt;hexadecimal&lt;/a&gt; E2, equivalent to 11100010 in binary. Read the &lt;a href="https://en.wikipedia.org/wiki/UTF-8#Description"&gt;UTF-8 stuff&lt;/a&gt; a bit more, and you find that 11100010 means &amp;#8220;this letter is made up of this byte and the two more bytes following this one&amp;#8221;. It stopped in the middle of the sequence of bytes which represent a single letter, hence the &amp;#8220;unexpected end of data&amp;#8221; in the error message.&lt;/p&gt;



&lt;p&gt;At this point, if you have control over the other computer, you might be thinking up cunning schemes to ensure that what it passes to each &lt;a href="https://docs.python.org/3/library/socket.html#socket.socket.send"&gt;&lt;code&gt;send()&lt;/code&gt;&lt;/a&gt; is always less than 1024 bytes at a time, without breaking up a multi-byte letter. After all, the data goes out in &lt;a href="https://en.wikipedia.org/wiki/Network_packet"&gt;packets&lt;/a&gt;, so what you get when you invoke &lt;code&gt;recv()&lt;/code&gt; must line up with the other side&amp;#8217;s &lt;code&gt;send()&lt;/code&gt;s, right? Wrong.&lt;/p&gt;



&lt;div class="wp-block-image"&gt;&lt;figure class="alignright is-resized"&gt;&lt;a href="https://commons.wikimedia.org/wiki/File:Homing_pigeon.jpg"&gt;&lt;img src="https://pics.livejournal.com/pw201/pic/000f70dx/s320x320" alt="" width="217" height="174" /&gt;&lt;/a&gt;&lt;figcaption&gt;Avian carrier&lt;/figcaption&gt;&lt;/figure&gt;&lt;/div&gt;



&lt;p&gt;The series of tubes is narrower in some places than others, and your data &lt;a href="https://en.wikipedia.org/wiki/IP_fragmentation"&gt;may be broken up to fit&lt;/a&gt;. A single &lt;a href="https://tools.ietf.org/html/rfc1149"&gt;carrier pigeon&lt;/a&gt; can only carry so much weight, you see, and the RSPB is pretty strict about that sort of thing. All that&amp;#8217;s guaranteed is that you get the bytes out in the order they went in, not how many you get out at a time.&lt;/p&gt;



&lt;p&gt;Fortunately, &lt;a href="https://en.wikipedia.org/wiki/Guido_van_Rossum"&gt;Guido&lt;/a&gt; thought of this and blessed us with &lt;code&gt;&lt;a href="https://docs.python.org/3/library/codecs.html#incrementaldecoder-objects"&gt;IncrementalDecoder&lt;/a&gt;&lt;/code&gt;, which knows how to remember that it was part way through a letter when it left off, so that the next time around the loop, it&amp;#8217;ll hopefully get the rest of the bytes and give you the letter you were hoping for:&lt;/p&gt;



&lt;div style="height: 250px; position:relative; margin-bottom: 50px;" class="wp-block-simple-code-block-ace"&gt;&lt;pre class="wp-block-simple-code-block-ace" data-mode="python" data-theme="monokai" data-fontsize="14" data-lines="Infinity" data-showlines="true" data-copy="false"&gt;connection = connect_to_the_thing()

decoder_class = codecs.getincrementaldecoder("utf-8")
# Make a new instance of the decoder_class
decoder = decoder_class()

# loop forever
while True:
    # receive up to 1024 bytes from the other computer
    bytes = connection.recv(1024) 
    text = decoder.decode(bytes)
    do_something_with(text)&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Much better! Now to raise a &lt;a href="https://github.com/fgimian/paramiko-expect/pull/60"&gt;pull request&lt;/a&gt; against &lt;a href="https://github.com/fgimian/paramiko-expect/blob/136744afeb6d2c462a5da7450b68cde9a9319eca/paramiko_expect.py#L156"&gt;paramiko_expect&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="footnotes"&gt;&lt;ol class="footnotes" style="list-style-type:decimal"&gt;&lt;li&gt;&lt;p&gt;We&amp;#8217;ll not worry about the other side closing the connection or the wifi packing up, for now.&amp;nbsp;&lt;a href="#rf1-178798" class="backlink" title="Return to footnote 1."&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;I do wonder whether questions on Stack Overflow about errors from Python&amp;#8217;s Unicode handling have more views in the aggregate than the &amp;#8220;&lt;a href="https://stackoverflow.com/questions/11828270/how-do-i-exit-the-vim-editor"&gt;How do I exit Vim&lt;/a&gt;?&amp;#8221; question (which is at 2.1 million views as I write this).&amp;nbsp;&lt;a href="#rf2-178798" class="backlink" title="Return to footnote 2."&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;hr&gt;
&lt;p&gt;&lt;i&gt;Originally posted at &lt;a href="https://www.noctua.org.uk/blog/2020/06/03/unicodedecodeerror-with-stuff-from-the-network/"&gt;Name and Nature&lt;/a&gt;. You can &lt;a href="https://www.noctua.org.uk/blog/2020/06/03/unicodedecodeerror-with-stuff-from-the-network/#comments"&gt;comment there&lt;/a&gt; (where there are currently &lt;img src="https://www.noctua.org.uk/blog/wp-content/plugins/journalpress/lib/wp-lj-comments.php?post_id=178798" border="0"&gt; comments) or here.&lt;/i&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src="https://www.dreamwidth.org/tools/commentcount?user=nameandnature&amp;ditemid=243016" width="30" height="12" alt="comment count unavailable" style="vertical-align: middle;"/&gt; comments</content>
  </entry>
</feed>
