<?xml version='1.0' encoding='utf-8' ?>

<rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/' xmlns:atom10='http://www.w3.org/2005/Atom'>
<channel>
  <title>nameandnature</title>
  <link>https://nameandnature.dreamwidth.org/</link>
  <description>nameandnature - Dreamwidth Studios</description>
  <lastBuildDate>Thu, 23 Jul 2020 00:13:06 GMT</lastBuildDate>
  <generator>LiveJournal / Dreamwidth Studios</generator>
  <lj:journal>nameandnature</lj:journal>
  <lj:journaltype>personal</lj:journaltype>
  <image>
    <url>https://v2.dreamwidth.org/8918874/2299717</url>
    <title>nameandnature</title>
    <link>https://nameandnature.dreamwidth.org/</link>
    <width>98</width>
    <height>100</height>
  </image>

<item>
  <guid isPermaLink='true'>https://nameandnature.dreamwidth.org/245928.html</guid>
  <pubDate>Thu, 23 Jul 2020 00:13:06 GMT</pubDate>
  <title>Link blog: format, python, flying, programming</title>
  <link>https://nameandnature.dreamwidth.org/245928.html</link>
  <description>&lt;dl&gt;
&lt;dt&gt;&lt;a href=&quot;https://pyformat.info/&quot;&gt;PyFormat: Using % and .format() for great good!&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;Python string formatting guide.&lt;br /&gt;&lt;small&gt;(tags: &lt;a href=&quot;http://pinboard.in/u:pw201/t:python&quot;&gt;python&lt;/a&gt; &lt;a href=&quot;http://pinboard.in/u:pw201/t:programming&quot;&gt;programming&lt;/a&gt; &lt;a href=&quot;http://pinboard.in/u:pw201/t:format&quot;&gt;format&lt;/a&gt;)&lt;/small&gt;&lt;/dd&gt;
&lt;dt&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=22u4qxm1YjY&quot;&gt;MIT Private Pilot Ground School 2019, F-22 Flight Controls &amp;#8211; YouTube&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;Fascinating talk from an F-22 pilot.&lt;br /&gt;&lt;small&gt;(tags: &lt;a href=&quot;http://pinboard.in/u:pw201/t:aircraft&quot;&gt;aircraft&lt;/a&gt; &lt;a href=&quot;http://pinboard.in/u:pw201/t:physics&quot;&gt;physics&lt;/a&gt; &lt;a href=&quot;http://pinboard.in/u:pw201/t:F-22&quot;&gt;F-22&lt;/a&gt; &lt;a href=&quot;http://pinboard.in/u:pw201/t:flying&quot;&gt;flying&lt;/a&gt; &lt;a href=&quot;http://pinboard.in/u:pw201/t:aviation&quot;&gt;aviation&lt;/a&gt; &lt;a href=&quot;http://pinboard.in/u:pw201/t:military&quot;&gt;military&lt;/a&gt;)&lt;/small&gt;&lt;/dd&gt;
&lt;/dl&gt;
&lt;hr&gt;
&lt;p&gt;&lt;i&gt;Originally posted at &lt;a href=&quot;https://www.noctua.org.uk/blog/2020/07/23/link-blog-format-python-flying-programming/&quot;&gt;Name and Nature&lt;/a&gt;. You can &lt;a href=&quot;https://www.noctua.org.uk/blog/2020/07/23/link-blog-format-python-flying-programming/#comments&quot;&gt;comment there&lt;/a&gt; (where there are currently &lt;img src=&quot;https://www.noctua.org.uk/blog/wp-content/plugins/journalpress/lib/wp-lj-comments.php?post_id=178865&quot; border=&quot;0&quot;&gt; comments) or here.&lt;/i&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=nameandnature&amp;ditemid=245928&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://nameandnature.dreamwidth.org/245928.html</comments>
  <category>format</category>
  <category>link blog</category>
  <category>military</category>
  <category>flying</category>
  <category>f 22</category>
  <category>python</category>
  <category>physics</category>
  <category>aircraft</category>
  <category>aviation</category>
  <category>programming</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
<item>
  <guid isPermaLink='true'>https://nameandnature.dreamwidth.org/243016.html</guid>
  <pubDate>Wed, 03 Jun 2020 17:55:24 GMT</pubDate>
  <title>UnicodeDecodeError with stuff from the network</title>
  <link>https://nameandnature.dreamwidth.org/243016.html</link>
  <description>&lt;p&gt;Occasionally I write about debugging, for the edification of others and to try to explain to muggles what I do all day. I ran into a fun one the other day.&lt;/p&gt;



&lt;h3&gt;Unicode&lt;/h3&gt;



&lt;div class=&quot;wp-block-image&quot;&gt;&lt;figure class=&quot;alignright is-resized&quot;&gt;&lt;img src=&quot;https://www.jwz.org/images/2017/frowning-pile-of-poo.png&quot; alt=&quot;&quot; width=&quot;206&quot; height=&quot;206&quot; /&gt;&lt;/figure&gt;&lt;/div&gt;



&lt;p&gt;Joel Spolsky&amp;#8217;s &lt;a href=&quot;https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/&quot;&gt;explanation of Unicode&lt;/a&gt; is excellent, but long. In brief: on a computer, we represent letters (&amp;#8220;a&amp;#8221;, &amp;#8220;b&amp;#8221; and so on) as numbers. Computers work with zeroes and ones, binary digits (or &lt;em&gt;bits&lt;/em&gt;), usually in groups of 8 bits called &lt;em&gt;bytes&lt;/em&gt;. Back in the mists of time, someone came up with &lt;a href=&quot;https://en.wikipedia.org/wiki/ASCII&quot;&gt;ASCII&lt;/a&gt;, a way to represent decent American letters by giving each letter a number. All those numbers fitted a single byte (a byte can represent 256 different numbers), so one byte was one letter, and all was well&amp;#8230; unless you weren&amp;#8217;t American and wanted to represent funny foreign letters like &amp;#8220;£&amp;#8221;, or some non-Latin alphabet, or a &lt;a href=&quot;https://www.jwz.org/blog/2017/11/unicode-character-frowning-pile-of-poo-u1f979/&quot;&gt;frowning pile of poo&lt;/a&gt;.&lt;/p&gt;



&lt;p&gt;The modern way of handling those foreign letters and poos is Unicode. Each different letter still has a number assigned to it, but there are a lot them, so the numbers can be bigger than you can fit in a byte. Computers still like to work in bytes, so you need to represent a letter using a sequence of one or more bytes. A way of doing this is called an &lt;em&gt;encoding&lt;/em&gt;. One popular encoding, &lt;a href=&quot;https://en.wikipedia.org/wiki/UTF-8&quot;&gt;UTF-8&lt;/a&gt;, has the handy feature that all those decent American letters have the same single byte representation as they did in ASCII, but other letters get longer sequences of bytes.&lt;/p&gt;



&lt;h3&gt;The Internet&lt;/h3&gt;



&lt;p&gt;The &lt;a href=&quot;https://en.wikipedia.org/wiki/Series_of_tubes&quot;&gt;series of tubes&lt;/a&gt; we call the Internet is a way of carrying bytes around. As a programmer, you often end up writing code to connect to other computers and read data. Suppose we just want to sit there forever doing something with a continuous stream of bytes the other computer is sending us&lt;sup&gt;&lt;a href=&quot;#fn1-178798&quot; title=&quot;We’ll not worry about the other side closing the connection or the wifi packing up, for now.&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;



&lt;div style=&quot;height: 250px; position:relative; margin-bottom: 50px;&quot; class=&quot;wp-block-simple-code-block-ace&quot;&gt;&lt;pre class=&quot;wp-block-simple-code-block-ace&quot; data-mode=&quot;python&quot; data-theme=&quot;monokai&quot; data-fontsize=&quot;14&quot; data-lines=&quot;Infinity&quot; data-showlines=&quot;true&quot; data-copy=&quot;false&quot;&gt;connection = connect_to_the_thing()

# loop forever
while True: 
    # receive up to 1024 bytes from the other computer
    bytes = connection.recv(1024)
    do_something_with(bytes)&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The data that comes back from the other computer is a series of bytes. What if you know it&amp;#8217;s UTF-8 encoded text, and you want to turn those bytes into that text?&lt;/p&gt;



&lt;div style=&quot;height: 250px; position:relative; margin-bottom: 50px;&quot; class=&quot;wp-block-simple-code-block-ace&quot;&gt;&lt;pre class=&quot;wp-block-simple-code-block-ace&quot; data-mode=&quot;python&quot; data-theme=&quot;monokai&quot; data-fontsize=&quot;14&quot; data-lines=&quot;Infinity&quot; data-showlines=&quot;true&quot; data-copy=&quot;false&quot;&gt;connection = connect_to_the_thing()

# loop forever
while True: 
    # receive up to 1024 bytes from the other computer
    bytes = connection.recv(1024)
    # turn it into text
    text = bytes.decode(&quot;utf-8&quot;)
    do_something_with(text)&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This seems to work fine, but very occasionally crashes on line 5 with a mysterious error message: &amp;#8220;UnicodeDecodeError: &amp;#8216;utf-8&amp;#8217; codec can&amp;#8217;t decode byte 0xe2 in position 1023: unexpected end of data&amp;#8221;. Whaaat?&lt;/p&gt;



&lt;p&gt;Some frantic Googling of &amp;#8220;UnicodeDecodeError&amp;#8221; turns up a bunch of people getting that error because they weren&amp;#8217;t actually reading UTF-8 encoded text at all, but something else&lt;sup&gt;&lt;a href=&quot;#fn2-178798&quot; title=&quot;I do wonder whether questions on Stack Overflow about errors from Python’s Unicode handling have more views in the aggregate than the “&amp;lt;a href=&amp;quot;https://stackoverflow.com/questions/11828270/how-do-i-exit-the-vim-editor&amp;quot;&amp;gt;How do I exit Vim&amp;lt;/a&amp;gt;?” question (which is at 2.1 million views as I write this).&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. So, you check what the other side is sending, and in this case, you&amp;#8217;re pretty sure it &lt;em&gt;is&lt;/em&gt; sending UTF-8. Whaaat?&lt;/p&gt;



&lt;p&gt;Squint at the error message a bit more, and you find it&amp;#8217;s complaining about the last byte it&amp;#8217;s read. You have to &lt;a href=&quot;https://docs.python.org/3/library/socket.html#socket.socket.recv&quot;&gt;give the &lt;code&gt;recv()&lt;/code&gt; a maximum number of bytes to read&lt;/a&gt;, so you picked 1024 (a handy power of 2, as is traditional). &amp;#8220;Position 1023&amp;#8221; is the 1024th byte received (since we start counting from 0, as is tradidional). That &amp;#8220;0xe2&amp;#8221; thing is &lt;a href=&quot;https://www.mathsisfun.com/hexadecimals.html&quot;&gt;hexadecimal&lt;/a&gt; E2, equivalent to 11100010 in binary. Read the &lt;a href=&quot;https://en.wikipedia.org/wiki/UTF-8#Description&quot;&gt;UTF-8 stuff&lt;/a&gt; a bit more, and you find that 11100010 means &amp;#8220;this letter is made up of this byte and the two more bytes following this one&amp;#8221;. It stopped in the middle of the sequence of bytes which represent a single letter, hence the &amp;#8220;unexpected end of data&amp;#8221; in the error message.&lt;/p&gt;



&lt;p&gt;At this point, if you have control over the other computer, you might be thinking up cunning schemes to ensure that what it passes to each &lt;a href=&quot;https://docs.python.org/3/library/socket.html#socket.socket.send&quot;&gt;&lt;code&gt;send()&lt;/code&gt;&lt;/a&gt; is always less than 1024 bytes at a time, without breaking up a multi-byte letter. After all, the data goes out in &lt;a href=&quot;https://en.wikipedia.org/wiki/Network_packet&quot;&gt;packets&lt;/a&gt;, so what you get when you invoke &lt;code&gt;recv()&lt;/code&gt; must line up with the other side&amp;#8217;s &lt;code&gt;send()&lt;/code&gt;s, right? Wrong.&lt;/p&gt;



&lt;div class=&quot;wp-block-image&quot;&gt;&lt;figure class=&quot;alignright is-resized&quot;&gt;&lt;a href=&quot;https://commons.wikimedia.org/wiki/File:Homing_pigeon.jpg&quot;&gt;&lt;img src=&quot;https://pics.livejournal.com/pw201/pic/000f70dx/s320x320&quot; alt=&quot;&quot; width=&quot;217&quot; height=&quot;174&quot; /&gt;&lt;/a&gt;&lt;figcaption&gt;Avian carrier&lt;/figcaption&gt;&lt;/figure&gt;&lt;/div&gt;



&lt;p&gt;The series of tubes is narrower in some places than others, and your data &lt;a href=&quot;https://en.wikipedia.org/wiki/IP_fragmentation&quot;&gt;may be broken up to fit&lt;/a&gt;. A single &lt;a href=&quot;https://tools.ietf.org/html/rfc1149&quot;&gt;carrier pigeon&lt;/a&gt; can only carry so much weight, you see, and the RSPB is pretty strict about that sort of thing. All that&amp;#8217;s guaranteed is that you get the bytes out in the order they went in, not how many you get out at a time.&lt;/p&gt;



&lt;p&gt;Fortunately, &lt;a href=&quot;https://en.wikipedia.org/wiki/Guido_van_Rossum&quot;&gt;Guido&lt;/a&gt; thought of this and blessed us with &lt;code&gt;&lt;a href=&quot;https://docs.python.org/3/library/codecs.html#incrementaldecoder-objects&quot;&gt;IncrementalDecoder&lt;/a&gt;&lt;/code&gt;, which knows how to remember that it was part way through a letter when it left off, so that the next time around the loop, it&amp;#8217;ll hopefully get the rest of the bytes and give you the letter you were hoping for:&lt;/p&gt;



&lt;div style=&quot;height: 250px; position:relative; margin-bottom: 50px;&quot; class=&quot;wp-block-simple-code-block-ace&quot;&gt;&lt;pre class=&quot;wp-block-simple-code-block-ace&quot; data-mode=&quot;python&quot; data-theme=&quot;monokai&quot; data-fontsize=&quot;14&quot; data-lines=&quot;Infinity&quot; data-showlines=&quot;true&quot; data-copy=&quot;false&quot;&gt;connection = connect_to_the_thing()

decoder_class = codecs.getincrementaldecoder(&quot;utf-8&quot;)
# Make a new instance of the decoder_class
decoder = decoder_class()

# loop forever
while True:
    # receive up to 1024 bytes from the other computer
    bytes = connection.recv(1024) 
    text = decoder.decode(bytes)
    do_something_with(text)&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Much better! Now to raise a &lt;a href=&quot;https://github.com/fgimian/paramiko-expect/pull/60&quot;&gt;pull request&lt;/a&gt; against &lt;a href=&quot;https://github.com/fgimian/paramiko-expect/blob/136744afeb6d2c462a5da7450b68cde9a9319eca/paramiko_expect.py#L156&quot;&gt;paramiko_expect&lt;/a&gt;.&lt;/p&gt;
&lt;hr class=&quot;footnotes&quot;&gt;&lt;ol class=&quot;footnotes&quot; style=&quot;list-style-type:decimal&quot;&gt;&lt;li&gt;&lt;p&gt;We&amp;#8217;ll not worry about the other side closing the connection or the wifi packing up, for now.&amp;nbsp;&lt;a href=&quot;#rf1-178798&quot; class=&quot;backlink&quot; title=&quot;Return to footnote 1.&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;I do wonder whether questions on Stack Overflow about errors from Python&amp;#8217;s Unicode handling have more views in the aggregate than the &amp;#8220;&lt;a href=&quot;https://stackoverflow.com/questions/11828270/how-do-i-exit-the-vim-editor&quot;&gt;How do I exit Vim&lt;/a&gt;?&amp;#8221; question (which is at 2.1 million views as I write this).&amp;nbsp;&lt;a href=&quot;#rf2-178798&quot; class=&quot;backlink&quot; title=&quot;Return to footnote 2.&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;&lt;hr&gt;
&lt;p&gt;&lt;i&gt;Originally posted at &lt;a href=&quot;https://www.noctua.org.uk/blog/2020/06/03/unicodedecodeerror-with-stuff-from-the-network/&quot;&gt;Name and Nature&lt;/a&gt;. You can &lt;a href=&quot;https://www.noctua.org.uk/blog/2020/06/03/unicodedecodeerror-with-stuff-from-the-network/#comments&quot;&gt;comment there&lt;/a&gt; (where there are currently &lt;img src=&quot;https://www.noctua.org.uk/blog/wp-content/plugins/journalpress/lib/wp-lj-comments.php?post_id=178798&quot; border=&quot;0&quot;&gt; comments) or here.&lt;/i&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src=&quot;https://www.dreamwidth.org/tools/commentcount?user=nameandnature&amp;ditemid=243016&quot; width=&quot;30&quot; height=&quot;12&quot; alt=&quot;comment count unavailable&quot; style=&quot;vertical-align: middle;&quot;/&gt; comments</description>
  <comments>https://nameandnature.dreamwidth.org/243016.html</comments>
  <category>blog</category>
  <category>python</category>
  <category>network</category>
  <category>unicode</category>
  <category>programming</category>
  <lj:security>public</lj:security>
  <lj:reply-count>0</lj:reply-count>
</item>
</channel>
</rss>
