DavidTimms/PythonXBMImage: Difference between revisions

Revision as of 11:24, 28 December 2016

X BitMaps: extract image data and display in ascii terminal

Intro

While being distracted from my previous distractions from an earlier distraction (fonts), I was intrigued by: The Great 202 Jailbreak - Computerphile

The report Revisiting a Summer Vacation: Digital Restoration and Typesetter Forensics included a link to an archive made available of Martin W. Guy's backup to tape from the 80s, where the authors found some data they used either directly or to confirm their earlier guesses about construction of the document. This appears to have taken about 6-8 weeks of work to rebuild one printed report from various information they were able to find or still had in hand. But I digress.

Within the archive index was images described as: Mike Hawleys's collection of tiny X bitmaps (Dec 1988) Including: Brian Kernighan.

Unknown image type

After clicking the extension-less file I saw:

#define bwk_width 48
#define bwk_height 48
static char bwk_bits[] = {
0x00, 0x00, 0xc0, 0x3f, 0x00, 0x00, 
0x00, 0x00, 0xf8, 0xea, 0x01, 0x00, ...

Hoping to find information to help find an application that could show this source code, I saved it to disk and tried file: bwk.image.c_source: ASCII text. Seeing this is c source code, I assumed that this was used by directly compiling into a larger c application. What I could have done was attempt to identify the file with:

test	result
ffprobe	bwk.image.c_source: Invalid data found when processing input
gimp	bwk.image.c_source' failed: Unknown file type
imageinfo	XBM X Windows system bitmap (black and white) 1850 8 48x48 using: imageinfo --format --fmtdscr --size --depth --geom bwk.image.c_source
imagemagick identify	XBM 48x48 48x48+0+0 8-bit sRGB 2c 1.85KB 0.000u 0:00.000

Python workout

Given the things I tried hadn't made me any the wiser, I considered starting a c app, to include the file and code something to view it somehow. Expanding my python skills was more important, so I began looking at the structure of the file to plan how to proceed:

read the file
get the width
get the height
get the image data
transform / feed into an image creation library to create a png/bmp - whatever was easiest.
not knowing about the file format I decided to also grab the filename (bwk) from the defines, assuming that you could define more than one image in a file, and you need to pick the right defines and data for a single file.

Python development environment

Half the problem is to find and setup a dev env to speed the development. I started with: python3, gedit, gnome-terminal, firefox (google, python manual, stackoverflow). Hacking involved trying stuff in the python3 interpreter, and then copy paste into my code.py in gedit.

Later I started using bluefish editor, with a custom command for python: gnome-terminal --geometry=100x50+1200+0 --working-directory='%c' -e "bash -c \"python3 '%f'; read -n1 junk\"" Clicking Python starts the terminal, with the correct directory, starts python3 with the file in the editor, and pauses the terminal output until a key is pressed - necessary to see interpreter messages and my hacking output.

Read the File

Getting the text of the file into a string in memory was easy:

fhand = open('bwk.image.c_source.txt')
sDataRaw = fhand.read()

Regular Expressions

I learnt a lot about regex's by using the re module, and then the extended regex module to detect conforming file content. The online regex builder/tester was useful. At first I tried to match the two #define lines, and extract the match group data, leaving the pixel data for a second regex.

import re
...
matchobject = re.search('.*#define ([[:alpha:]]{1,3})_([[:alpha:]]{4,6}) ([[:digit:]]{1,2}).*', sDataRaw)
if matchobject:
  print(matchobject)

However, this would only show the first match. I extended this to match the overall file structure extracting: imagename1, metric1, value1, imagename2, metric2, value2, imagename3, and which had data that looked like a c string of 0xab hex values. Since I needed multiple matches, I changed to regex library instead:

import regex
...
pattern = regex.compile('#define ([[:alpha:]]{1,8})_([[:alpha:]]{4,6}) ([[:digit:]]{1,2})\n.*#define ([[:alpha:]]{1,8})_([[:alpha:]]{4,6}) ([[:digit:]]{1,2}).*static char ([[:alpha:]]{1,8})_bits\[\] *= *.*[, \n0x[:xdigit:]]+\};', regex.DOTALL)

Pixel Data

For the image pixeldata, I created the pattern, and used regex.findall to return a list of strings containing two-character hex codes.

hexbytespattern = regex.compile('0x([[:xdigit:]]{2})', regex.DOTALL)
matchhexobject = hexbytespattern.findall(sDataRaw)
print(matchhexobject)

I stopped trying regexs once I could see the example data had been correctly extracted:

['00', '00', 'c0', '3f', '00', '00', '00', '00', 'f8', 'ea', '01', '00', '00', '00', ...]

The .DOTALL was important to continue the search after a newline character.

Identifying the Image Name and Size

I added some logic to grab the first imagename1, and confirm it is the same as the other two imagenames. Also, I needed the width of the image to work out how to arrange the data when displaying the image. The width was also available so extracted that to an integer as well.

  sImageName = matchobject[0][0]
  if(sImageName != matchobject[0][3]
      or sImageName != matchobject[0][6]):
    print('  ImageName={0} does not match'.format(sImageName))
  else:
    print('  ImageName={0}'.format(sImageName))
    if matchobject[0][1] == 'width':
      nImageWidth = int(matchobject[0][2])
    elif matchobject[0][4] == 'width':
      nImageWidth = int(matchobject[0][5])
    else:
      nImageWidth = 0;
      print('  ImageWidth not defined')

    if matchobject[0][1] == 'height':
      nImageHeight = int(matchobject[0][2])
    elif matchobject[0][4] == 'height':
      nImageHeight = int(matchobject[0][5])
    else:
      nImageHeight = 0
      print('  ImageHeight not defined')
    nPixels = nImageWidth * nImageHeight
    print('  ImageSize={}x{} = {} pixels'.format(nImageWidth, nImageHeight, nPixels))

Convert List of Strings to a Binary Array

Given python doesn't have arrays, I was interested to see bytearrays added to python3. Also I found a bitarray library to do some work for me.

    pixelbytestring = ''.join(matchhexobject)
    print(pixelbytestring)

import binascii
...
    pixelbytes = binascii.unhexlify(pixelbytestring)
    print(pixelbytes)

from bitarray import bitarray
...
    pixelarray = bitarray()
    pixelarray.frombytes(pixelbytes)
    print(pixelarray)

    BitmapDraw(nImageWidth, nImageHeight, pixelarray)

It took some time to find a way to convert the list of strings containing 2 hex characters into a single bytearray which bitarray needed. Firstly I str.joined the list into a single string. Next binascii provided an unhexlify which converted the text into a bytearray. Next bitarray.frombytes converted this into the bitarray format e.g. '1001011110100101' etc. Now this can be passed to a function which takes the width, height and raw binary pixelarray.

Drawing the Image

I began thinking that filling my extracted image data into some format to create a bitmap, png or some other image filetype would be easy, and it may well be. But remembering some ascii-art from the early 80's I decided to try my hand at that. I started with on bits as X and off bits as .
By iterating through the bitmaparray, a string containing X and . for each pixel state was generated. I wasn't sure how the bytes had been arranged, and suspected that the 8 bits in each byte would be ordered left most pixel column of 8 bits first, then 2nd column of 8 bits, for each row of 8 bit data.
Yet the simplest way to implement would be to assume it's the top pixel line first 8 bits from the first byte, followed across by top line bits 9-16 from byte 2. Trying this I could see an image which was somewhat face like, but was too narrow. I changed the drawing to be two symbols of each. This made the image almost square, but while there was structure, the image didn't seem continuous. By inserting a |after each 8bits, I could see that each 8bit group needed to be flipped left to right.

|####............|....############|................|................|
|................|##########......|######..##..##..|..............##|
|................|##############..|################|............####|
|................|################|################|........########|
|####............|################|##########..####|........########|
|####............|################|##..##########..|......##########|
|########........|##........######|################|....############|
|########........|##..........####|################|..##############|
|##########......|................|########..##....|..##############|
|..########......|................|############..##|################|
|....########....|................|########..##....|################|

Devising a bit reversing filter for each 8 block group was next on the agenda, until I remembered seeing some endian settings for bitmaparray.

    pixelarray = bitarray(endian="little")
    pixelarray.frombytes(pixelbytes)

Close up the result was not impressive, but taking 5 steps back and I could see the bearded face of Brian Kernighan as follows.

    pixelarray = bitarray(endian="little")
    pixelarray.frombytes(pixelbytes)

I also tried other characters ([XXoo], [##oo], [@@  ], [##--], etc.), but stuck with [##..],

......##########....................##..######################..................
......########..................##..############################................
....########........................##..##########################..............
....########..........................##..############..##########..............
..##########....................................##..##############..............
..########..........................................##..######..####............
....########..............................................##########............
..########............##..##............########........############............
..############..############..##......##......##..##......######..##............
....######..............................######....##..##..##########............
....##################..################..##############..##########............
....######....########..####..##......######..########....##..####..............

Post

You don't need to go to this extent at home; further research found that the file format extension is an ancient, yet known file format usually xbm. By renaming the extension, ffprobe, imageviewer and gimp are able to open the image. With today's displays, this is very small on screen, and it is difficult to make out the face. Zooming the image made it bigger, but rougher; still not really able to make out the face. 
In fact, the ascii art version with the eye of the beholder at 4 metres may be the most effective display.

@@ Line 1: / Line 1: @@
-==== X BitMaps: extract image data and display in ascii terminal ====
+== X BitMaps: extract image data and display in ascii terminal ==
-= Intro =
+=== Intro ===
 While being distracted from my previous distractions from an earlier distraction (fonts), I was intrigued by:
 [https://www.youtube.com/watch?v=CVxeuwlvf8w The Great 202 Jailbreak - Computerphile]
@@ Line 10: / Line 10: @@
 Including: [http://medialab.freaknet.org/martin/tape/stuff/bitmaps/face/bwk Brian Kernighan].
-= Unknown image type =
+=== Unknown image type ===
 After clicking the extension-less file I saw:
 <pre>#define bwk_width 48
@@ Line 37: / Line 37: @@
 |}
-== Python workout ==
+=== Python workout ===
 Given the things I tried hadn't made me any the wiser, I considered starting a c app, to include the file and code something to view it somehow. Expanding my python skills was more important, so I began looking at the structure of the file to plan how to proceed:
 * read the file
@@ Line 46: / Line 46: @@
 * not knowing about the file format I decided to also grab the filename (bwk) from the defines, assuming that you could define more than one image in a file, and you need to pick the right defines and data for a single file.
-= Python development environment =
+=== Python development environment ===
 Half the problem is to find and setup a dev env to speed the development. I started with: python3, gedit, gnome-terminal, firefox (google, python manual, stackoverflow). Hacking involved trying stuff in the python3 interpreter, and then copy paste into my code.py in gedit.
@@ Line 53: / Line 53: @@
 Clicking Python starts the terminal, with the correct directory, starts python3 with the file in the editor, and pauses the terminal output until a key is pressed - necessary to see interpreter messages and my hacking output.
-= Read the File  =
+=== Read the File ===
 Getting the text of the file into a string in memory was easy:
 <pre>
@@ Line 60: / Line 60: @@
 </pre>
-= Regular Expressions =
+=== Regular Expressions ===
 I learnt a lot about regex's by using the re module, and then the extended regex module to detect conforming file content. The [https://regex101.com/ online regex builder/tester] was useful.
 At first I tried to match the two #define lines, and extract the match group data, leaving the pixel data for a second regex.
@@ Line 78: / Line 78: @@
 </pre>
-= Pixel Data =
+=== Pixel Data ===
 For the image pixeldata, I created the pattern, and used regex.findall to return a list of strings containing two-character hex codes.
 <pre>
@@ Line 91: / Line 91: @@
 The .DOTALL was important to continue the search after a newline character.
-= Identifying the Image Name and Size =
+=== Identifying the Image Name and Size ===
 I added some logic to grab the first imagename1, and confirm it is the same as the other two imagenames. Also, I needed the width of the image to work out how to arrange the data when displaying the image. The width was also available so extracted that to an integer as well.
 <pre>
@@ Line 119: / Line 119: @@
 </pre>
-= Convert List of Strings to a Binary Array =
+=== Convert List of Strings to a Binary Array ===
 Given python doesn't have arrays, I was interested to see bytearrays added to python3. Also I found a bitarray library to do some work for me.
 <pre>
@@ Line 140: / Line 140: @@
 It took some time to find a way to convert the list of strings containing 2 hex characters into a single bytearray which bitarray needed. Firstly I str.joined the list into a single string. Next binascii provided an unhexlify which converted the text into a bytearray. Next bitarray.frombytes converted this into the bitarray format <code>e.g. '1001011110100101' etc. Now this can be passed to a function which takes the width, height and raw binary pixelarray.
-= Drawing the Image =
+=== Drawing the Image ===
 I began thinking that filling my extracted image data into some format to create a bitmap, png or some other image filetype would be easy, and it may well be. But remembering some ascii-art from the early 80's I decided to try my hand at that. I started with '''on''' bits as <code>X</code> and '''off''' bits as <code>.</code>
@@ Line 186: / Line 186: @@
 </pre>
-= Post =
+=== Post ===
 You don't need to go to this extent at home; further research found that the file format extension is an ancient, yet known file format usually xbm. By renaming the extension, ffprobe, imageviewer and gimp are able to open the image. With today's displays, this is very small on screen, and it is difficult to make out the face. Zooming the image made it bigger, but rougher; still not really able to make out the face.
 In fact, the ascii art version with the eye of the beholder at 4 metres may be the most effective display.

Search