Bank OCR

September 14, 2020 ☼ JS

Coding katas are fun to do. It’s also good way to train your problem solving skills and prepare for technical interviews.

You work for a bank, which has recently purchased an ingenious machine to assist in reading letters and faxes sent in by branch offices. The machine scans the paper documents, and produces a file with a number of entries which each look like this:

    _  _     _  _  _  _  _
  | _| _||_||_ |_   ||_||_|
  ||_  _|  | _||_|  ||_| _|

Each entry is 4 lines long, and each line has 27 characters. The first 3 lines of each entry contain an account number written using pipes and underscores, and the fourth line is blank. Each account number should have 9 digits, all of which should be in the range 0-9. A normal file contains around 500 entries.

Your first task is to write a program that can take this file and parse it into actual account numbers.

You can read the full kata here

Approach

When I solve these sort of problems I try first to figure out what are the constants here. It helps simplify your solution if you can identify some assumptions”. In this case we can find that:

// Each entry is 4 lines long
const LINE_HEIGHT = 4;
// Account number should have 9 digits
const ACCOUNT_NUMBER = 9;
// The digit is a 3x3 matrix
const MATRIX_SIZE: 3;

Pseudo code

Before jumping into writing the solution I usually write some pseudo code in order to see if I understood the problem correctly and if I’m going to be able to cover the necessary cases:


- Read the file from disk (we can keep it a sync operation. No need to stream for 500 text records)
- Parse the text digits into Number:
    - Go through the file line by line
    - Build a map with the digit numerical rappresentation
    - Check if the current value is in the map
- Store parsed results in an array

Tips: I always use a dev-ready playgoround so I don’t waste time setting up tooling and I can focus on the solution. I personally use node-typescript-boilerplate that comes with Typescript and jest supports.

The solution

First of all let’s create a txt file with some smaple cases:

 _  _  _  _  _  _  _  _  _
| || || || || || || || || |
|_||_||_||_||_||_||_||_||_|

    _  _     _  _  _  _  _
  | _| _||_||_ |_   ||_||_|
  ||_  _|  | _||_|  ||_| _|

Let’s read the file next:

import { promisify } from "util";
import { readFile } from "fs";
import { join } from "path";

const FILE = join(__dirname, "account-numbers.txt");

async function readFromFile(filePath: string) {
  return await promisify(readFile)(filePath, "utf8");
}

(async () => {
  try {
    const rawDocument = await readFromFile(FILE);
  } catch (error) {
    console.error(`Something went wrong ${JSON.stringify(error)}`);
  }
})();

Now we need to parse the numbers. We can think of digits as 3x3 cells. We can then assign a real value to them by mapping the text rappresenation to some sort of a dictionary:

interface RawDigit {
  [key: string]: number;
}

// prettier-ignore
const dictonary: RawDigit = {
  [
    ' _ ' +
    '| |' +
    '|_|'
  ]: 0,
  [
    '   ' +
    '  |' +
    '  |'
  ]: 1,
  [
    ' _ ' +
    ' _|' +
    '|_ '
  ]: 2,
  [
    ' _ ' +
    ' _|' +
    ' _|'
  ]: 3,
  
// ...
}

After that we need to write a parser that will be able to transform the RawDigit into a number:

import { dictonary } from "./dictionary";

const LINE_HEIGHT = 4;

// accountSnapshot is the result of the readFile operation
export function accountReader(accountSnapshot: string): string[] {
  const buffer: string[] = [];
  const accountNumbers: string[] = [];

  // Go through accountSnapshot and every 4 line process the row
  accountSnapshot.split("\n").forEach((line: string) => {
    buffer.push(line);

    if (buffer.length === LINE_HEIGHT) {
      /* 
            Buffer (after joining) at this point will look like this
            _  _  _  _  _  _  _  _  _  
            | || || || || || || || || |
            |_||_||_||_||_||_||_||_||_|

        We send the string to parseRawDigit method
        */
      accountNumbers.push(parseRawDigit(buffer.join("\n")));
      buffer.length = 0;
    }
  });

  return accountNumbers;
}

function parseRawDigit(rawAccountText: string) {
  let parsedAccount = "";
  /*
    We know that an acocunt number has 9 digits so we have to parse the single digits by passing the position and the whole line
  */
  for (let digitPlace = 0; digitPlace < 9; digitPlace++) {
    parsedAccount += dictonary[extractRawDigit(digitPlace, rawAccountText)];
  }
  return parsedAccount;
}

function extractRawDigit(position: number, accountText: string) {
  const accountLines = accountText.split("\n");

  /*
    accountLines at this point looks like this:

    [
        '    _  _     _  _  _  _  _',
        '  | _| _||_| _ |_   ||_||_|',
        '  ||_  _|  | _||_|  ||_| _',
        '
    ]

    Every array item it's a line in the 3x3 matrix. We have now to take only the first 3 chars of the string for each line to construct our matrix
  */

  let extractedRawDigit = "";
  [0, 1, 2].forEach((lineNum) => {
    const startPos = position * 3;
    const endPos = (position + 1) * 3;
    extractedRawDigit += accountLines[lineNum].slice(startPos, endPos);

    /* 
        |
        |
        _ 
        _|
       |_ 
        _ 
        _|
        _|

        ... etc

        this can be easilly compared to our Dictionary as you can see in the parseRawDigit() method
    
    */
  });
  return extractedRawDigit;
}

Here’s the final code:

(async () => {
  try {
    const rawDocument = await readFromFile(FILE);
    const parsedAccountNumbers: string[] = accountReader(rawDocument);
    console.log(parsedAccountNumbers.join("\n"));
  } catch (error) {
    console.error(`Something went wrong ${JSON.stringify(error)}`);
  }
})();

I haven’t really invested much time into making the solution performant. Probably there’s a better way to solve it but I think the code is simple enough and usually that’s a good sign ;)


If you have any suggestions, questions, corrections or if you want to add anything please DM or tweet me: @zanonnicola



Plase rate this article
Bad Good
Thank you for your feedback :) Ops, something went wrong :(