A Matter of Sorts

So, in a previous post — in fact, it was my last one where I pontificated on Tech Interviews — I said I’d discuss the very popular sorting algorithm. Namely, two very efficient and common ones. Merge Sort and Quick Sort.

I am going to break these down as best I can, in a way that anyone, even people who have never coded before can understand.

So, let’s start off with the question I am sure might be asked. What is a sort? Well, it’s a way of organizing data in an ascending or descending order (generally, it’s ascending, but there are times when going highest to lowest is important).

I will also be using a term here a lot, called an array. This is a fundamental data structure in computer science, in which data is stored. Think of it is either as a sorted or unsorted deck of cards. An unsorted deck of cards might appear to have a random placement of suits and numbers, while a sorted deck will go from Ace to King, Clubs, Ace to King, Diamonds, Ace to King, Hearts, Ace to King, Spades.

Yes, I know Aces are stronger than Kings in most games, but I’m using it here as a placeholder for 0 or 1 depending on your fancy. There is no official suit strength in cards except for specific games and I organized them alphabetically. The array is used in almost every computer programming application when data needs to be organized*. When you organize your books on a shelf or put your clothing in a certain order or walk up and down the aisles of a grocery store, almost everything is sorted in an array like fashion.

We use sorting algorithms in our daily lives without even knowing it. When we organize anything, we’re using a sorting algorithm, which makes their sister algorithms, the search algorithm, much easier to implement in the future.

I actually found myself using a sorting algorithm — albeit a very primitive and subjective one — when I was organizing my closet the other day. My dear wife is always on me to keep my clothes organized and in some semblance of order, so I originally had sections — suits, dress coats, general coats, dress shirts, t-shirts, polo shirts, button down shirts, sweaters, sweatshirts, and finally slacks. Most of my pants (trousers for our friends in the UK and Commonwealth), however, are kept in my dresser. I sorted that as well. It is a bit less organized than my closet, but that’s beside the point.

Let’s use this example of a sort — in fact, I subconsciously used a form of a quicksort to organize my clothes in each of their categories by color or pattern. Lighter clothing is basically the lesser valued (0, this is completely flipped from how its done on computers, but bear with me) on the left to darker colors and blacks on the right.  In fact, it is a modification of the quick sort algorithm due to the fact it is based on hue, more than the highest index value. My pivot is in the middle, rather than the end of the ‘array’ of clothing.

Here is the pseudocode for this sort:

The Nerdographers's clothingSort(clothing set, lighter, darker)
    if(lighter than darker):
        if(clothing color is solid): 
          partition clothing by color / hue  // it should be noted middle red is my 50/50, 
                                             // cooler colors generally end up considered darker
          partition(clothing, lightest color, darkest color)
        if(clothing "color" is patterend): 
          examine prominent/primary color, organize by equivalent solid
          partition(clothing, lightest color, darkest color)       
     clothingSort(clothing set, light, partition)
     clothingSort(clothing set, partition + 1 shade darker, darker)

The Partition(clothing set, lightest, darkest):
    middle hue = average of lightest and darkest / warmest to coolest
    pivot := clothing set's middle hue
    tracker := lightest, one shade lighter

    while lightest <= darkest: 
        if clothing_set[lightest] <= pivot
            tracker + 1
            switch the hues appropriately (darkest to lightest)
    switch one hue darker with the middle hue, continue process
     until completed.

Compare this to the true quicksort:

/* low --> starting index, high --> ending index */
quickSort(array, low, high):
    if(low < high): 
        partition := (low + high) / 2
        // low and high will shift based on the portion 
        // of the array you are working on.
    quicksort(array, low, partition)
    quicksort(array, partitition + 1, high)

partition algorithm (array, low, high): 
    pivot := highest value of the array
    tracker := lowest value - 1 // this is important, as the highest and lowest values change.

    indexer := lowest
    while indexer <= highest value less 1
        indexer + 1
        switch the data at the indexer tracker indices of the array
     
     switch the data at the tracker and highest value of the array
     set the partition index to the tracker + 1

The quicksort is fundamentally a Divide and Conquer algorithm. It breaks down various parts of the data set, in this case, the array into smaller, more manageable portions and exchanges data placements if they are in the improper value order. Unlike its cousin algorithm, mergesort, quicksort does not necessarily break down the array into component pieces. It does everything in place — meaning no new arrays or any extra space is taken up in the computer’s memory, everything that is changed is based on what exists already.

Imagine you are sorting coins by their value. You don’t break them into separate groups, whether by value or because you want smaller manageable groups to work with, you simply move them around based on what their value is relative to their neighbors.  Eventually, you’ll get the right order of coins. Sometimes you will have multiple coins of the same value, sometimes you will sets of coins that all have a separate and discrete value. All the coins will end up in their proper order, regardless.

The critical piece of it is the partitioning behavior.  One thing to keep in mind, that quicksorts consider the value of its pivot, which is declared in the partition, generally based on the highest value given at the time, and use that to sort, rather than going left to right or right to left.

MergeSort:

MergeSort is actually a bit more complex but ultimately just another divide and conquer algorithm. It compares from left to right, rather than highest to lowest internally.

MergeSort(array, left, right):
    middle = (right + left) / 2
    mergeSort(array, left, middle)
    mergeSort(array, middle + 1, right)
    merge(array, left, middle, right)

merge(array, left, middle, right):
    i, j, k // these are tracker variables.
    n1 := middle - left + 1;
    n2 := right - middle;

    Left[N1], Right[N2] // new sub arrays with size equivalent. They may or may not be 
                        // totally equal depending on the size of the initial array.
    while i < size of Left(n1), place next element index into current slot.
    while i < size of Right(n2), place index of m + 1 + j into current slot.

As stated before, these are divide and conquer algorithms. These are especially popular in classical computing because it breaks down data into roughly equal halves and it is much easier for the computer to handle as computers think in a language of ones and zeroes called binary.  3-way, 4-way, even 10-way sorting algorithms exist out there but they are not sufficiently or substantially more efficient as than two-way sorting arrays, therefore are not explored as much more than curiosities and toy problems.

Mergesort breaks down large data sets into smaller composite arrays to increase the speed of the sort. Returning to the coin analogy,  first, you separate coins out into smaller discrete groups, let’s say in this case 4, and you organize them by size. Then you merge two coin groups into one another, and within that group, you sort them again, this should be easier because at least one large coin will be at the end of the sub-array. Keep sorting and merging until everything is in its proper order.

Special Guest – Bubble Sort!

Bubble sort is an often maligned and inefficient algorithm. It is actually quite effective for small data sets (I’d say, 20 or less on modern machines) where the performance of sorts like Quick and Merge might either be overkill or actually less efficient. Bubble sort is something of a walking joke and seen as the ‘young programmer’s first sort’ — and it’s a bad rap to have, but it is a naive solution to a very complicated problem set and it does have its uses.

A bubble sort generally looks like this:

bubbleSort(array, size):

i, j <- 0; // declaring i and j's index starting point

while i < size - 1:
   while j < size - i - 1:
   if array at j is larger than array at j + 1
       switch array at j with array at j + 1

It does not divide and conquer, it simply goes over everything in the array,

You know what would be fun? Maybe I will go and create a kind of anti-sort… a chaos sort, that will take a sorted array and jumble it up. Might be a fun exercise. I’ll try creating this and post it for my next computer science related post, time permitting.

 

*This is actually not always true, there is something called a linked list which often fulfills a similar purpose or an associate array (also frequently called the hash map) that is beyond the scope of this post. Arrays are best for smaller data sets, where time is not of paramount concern.

The Interview

Okay, I know. It’s not science, but let’s talk about something that is very important to the process of being any sort of technical worker or scientist. The Interview. This is important for just about… anyone, in fact, who is seeking a job. Not everyone is cut out to run their own business. I know I’m not and I am not afraid to admit it. Software engineering and computer science is frequently a collective effort to get stuff done. There are no more superheroes. There never were, in fact. That’s a myth. Bill Gates didn’t do it alone. Neither did Steve, Jobs and Wozniak, nor did Mark Zuckerberg. Sergei Brin and Larry Page didn’t either.

The reality for the vast majority of us is: we have to trade our labor and time for compensation.

So, we do that dread practice — the Interview. In this musing, I will be discussing primarily the Whiteboard Interview, something that is a lot of fun and harrowing all at the same time.

And I will admit up front — just like standardized exams and finals, I am quite awful at them. It is not that I don’t know my material, but if I don’t spend considerable time preparing myself mentally for them, I fall flat. I know of people who can walk into an interview and wow everyone. Other people can’t. And that is okay. Whether or not I am predisposed to being ‘bad’ at them due to ADHD… questionable diagnosis, I’m like most boys my age [90s kids] … where my rambunctiousness and need to exercise was mistaken for a cerebrophysical disorder that needed to be medicated… or I just prefer a more leisurely pace, I admit, for all my sternness and passion, I’m actually a more Type B personality. If I have to get up and do something, there needs to be a good reason for it. Or maybe I’m just plain lazy. But laziness is the Mother of Efficiency, so hah.

Whiteboard Interviews (Often Onsite)

So, let’s talk one of my pet peeves and the one interview format I actually do like. That’s right. I listed a pet peeve as something I like… The Whiteboard Interview. This is a very common practice. Many companies are also moving away from them now in favor of technical screens through LeetCode or HackerRank and take-home-coding interviews. I have some issues with that as well, which I will discuss later.

The Whiteboard Interview, alternatively, the Onsite, is the standard bearer, where you and an interviewer, or possibly a panel or interviewers, sit down or stand up and work through a problem to ensure you know your computer science fundamentals and can come up with a viable solution relatively quickly to a presented problem. The biggest benefit to this is that it actually measures your ability to problem solve under pressure, ask intelligent questions, and have a discussion with the interviewer so they can see how you think. The Whiteboard Interview can be anything from a single one on one chat to an all-day event where you work with multiple people across multiple teams or even the same team.

Now, the reason I have a pet peeve with them is the inherent unfairness/bias that can occur with them.

If you get an interviewer that is negatively predisposed to you, as I have in the past, for whatever reason, the odds of your success, even if you rock star, is pretty much nil. Some interviewers also, come from cultures, where rote memorization and perfection are expected, not how you think or how you arrive to your conclusion and this can definitely throw you. This is where I have often stumbled in the past. I will do quite well, get an interviewer that has a specific answer in mind, and will poo-poo anything that isn’t what they were thinking.

If they are this sort of interviewer and you like mergesort and they prefer quicksort? Well, you are probably dead in the water at that point. For those who do not know what merge and quicksorts are, these are two very popular and efficient sorting algorithms. I will go into some depth with them in a later post, but they’re used to sort fairly large, and I say fairly, because even they break down at certain orders of magnitude, arrays of data. Mergesort tends to break things down piece by individual piece, and merge it all back together, where as quicksort relies on pivots and re-arranging around that pivot. For most tasks, they’re fungible and equivalent. There are a few cases in very small and very large data sets where one trumps the other. For most use cases, they are pretty interchangeable in my experience.

This can make Whiteboarding a frustrating and ultimately negative experience for many aspirant developers and I’ve missed out on more than a few opportunities this way. I do not begrudge the companies I interviewed and did not do ‘well’, but it can be a disheartening experience. However, I do highly approve of Whiteboard Interviews for the simple fact that they are the simplest sanity test you can muster. And here’s why…

We currently have a glut of developers that have only a rudimentary understanding of computer science fundamentals, data structures, and algorithms. They’re often trained for the questions, but they do not understand the logic behind them. And I think the Whiteboard Interview often does a FANTASTIC job of weeding those people out. It’s not that they’re undeserving of jobs or development careers, but when interviewing for a major technical corporation, especially those that have mission critical directives where lives or billions of dollars may be at stake, ensuring your engineer has the fundamentals down cold, or at least understands them on more than a superficial level is very important. Many of these aforementioned developers are fantastic full-stack developers and software engineers in their own right, but at certain points in their careers, they are very much more suited for positions in smaller or less mission critical firms until they can bolster their knowledge.

Whiteboard Interviews often help find these people. A good engineer, while they should always practice and expand their knowledge, should be able to come to a reasonable conclusion about the problem in front of them without too much prompting. It is important to ask questions, figure out the system, and the expectations, but regurgitating a solution from a coding website or textbook does not show your ability to problem solve.

This is why I think Whiteboards are the gold standard for interviews, even with all their flaws and the potential false negatives (and positives!) they produce.

The Phone Screen

One can’t talk about Whiteboard or any other interview for that matter without the phone screen. These are often done as a sanity check to make sure your actual skills and personality match what you are presenting before a company takes a risk on flying you out to their site and putting you up in a hotel room. Most of the time, the questions are just as difficult as the ones you will encounter at Whiteboard Interview, but many times, they can be quite misleading, especially if a team or company is planning mass-hirings. Phone screens (and I include video conferences in these) can be just as nerve wracking as the on-site. However, they are absolutely critical for sanity testing. Lots of people can talk a good game, get past initial screens and HR interviews, and then they fall apart on the phone screen. I fumbled my first few due to lack of experience and nerves. Then I got better.

You are also put at the distinct disadvantage of coding in a shared document much of the time. This is all the convenience of programming on your computer, without an IDE and you have to worry about checking yourself, without the automatic muscle memory allowed in a Whiteboard. Aside from these complaints, I actually enjoy phone screens and think they’re one of the most fair ways to screen candidates.

The LeetCode/HackerRank/Online Assessment

I grudgingly do these, only because they’ve become so popular as a screening tool for smaller corporations and many startups. What I dislike about them, is the same reason I dislike certain Whiteboard Interviewers. The metric is based on whether or not you pass all or most of the test cases on the website in question.

Another reason I don’t care for them: these are remarkably easy to cheat on. and it is almost encouraged due to the metric of ‘perfection’, often based on the test cases provided to the website or assessment in question, required to pass them. Most of the test questions given are usually popular ones with easily accessible solutions either in Cracking the Coding Interview, GeeksForGeeks.com, or on various discussion boards. Companies like TripleByte have tried to stop this by disabling the copy and paste function in their IDEs, but most do not, and any savvy aspirant can get around these.

Until they’re given, observed, and the IDE is locked and outside interaction is disabled, I cannot give a positive impression of this style of interview. It generates entirely too many false positives with unscrupulous developers who end up wasting the company’s time on the onsite and too many false negatives for more scrupulous developer who don’t understand the tricks of the system.

The Take-Home Coding Assignment (THCA)

This is a style of interview that many start ups and newer companies use prior to bringing potential employees on site or even hiring them. It also runs the risk of you doing their work, for free. If the assignment takes more than 3-4 hours to accomplish and has a large number of very specific requirements without much leeway in terms of how you come to the solution, be very suspect. Odds are, you are doing work they should have their in-house people doing, for free.

On the other hand, if you enjoy doing things at a more leisurely pace or simulating the kind of problems you would encounter at that particular job, these are great. I have very mixed feelings about the THCA. On the one hand, if you have a dynamic, rapidly growing, company where you do not have a lot of time to do interviews and need to see if someone has the chops to quickly produce the code needed to get the job done, it can be a viable alternative — but you also run the risk of being out some of your time and hard-earned money, especially if you are made to host on a HEROKU or AWS instance. The amount is usually trivial, but if you are at a point where every nickel and dime counts, these can be a bit onerous. Most of these services do have -free- tiers, but they often have limitations that require you to babysit the project constantly which can be better spent looking for different positions.

Another issue I have with THCAs is that it is also quite easy to cheat on these, same with the LeetCode/HackerRank/Online Assessments. If someone has posted their code on Github or another like service, you can skip past a lot of the song and dance on these and potentially get into a position you have no right having. It is one thing to investigate someone else’s approach, it is is another thing to rip it off wholesale and it is on the company interviewing you to do their due diligence to make sure you are not doing this. Unfortunately, many of them do not have the time, and people fly through and waste their and the company’s time.

Conclusions…

Ultimately, interviews are just part and parcel of the experience of getting and keeping a job. They’re often not that much fun, and can be very stressful, but especially in technical careers, they’re an important sanity test. The flaws of the THCA and Online Assessments are the reason I greatly prefer the phone screen followed by an Whiteboard Interview/Onsite, whether in person or virtual.

First Post

Hello,

 

I started this blog to help promote science and technological education. The name “Buckwild Nerdography” may seem evocative, even risqué, but it is more of a tongue in cheek reference to the way that the sciences and rational thinking are being treated in many corners of today’s world. At worst, it is a mildly satirical statement against the irrational fears that make the understanding of science, in particular evolutionary biology, climate change, theoretical physics, and other work that challenges long held dogmas based purely on belief so salacious. At best, it is a open embrace to the world of science and showing that knowledge and understanding are fun.

 

On challenging dogmas… let us not forget that it was not until 1451 when a mysterious flash filled the skies and then a supernova confirmed by Tycho Brahe in 1572 that Europeans even considered the outside world and cosmos anything but largely static at large scale. Everything then was it always had been and always was. The catastrophic death of a distant star was enough to sunder that belief.

crab nebula - supernova Tycho Brahe

Today, we face a situation where what was originally believed to be common sense is called into question, for better or worse. When it flies in the face of settled or primarily settled science, it is a problem. First and foremost, that springs to mind is the anti-vaxxer movement.

 

This has started the resurgence of diseases that were nearly extinct in developed countries, such as the measles, chicken pox, and polio. Yes. Polio is making a comeback. And why? Because a small contingent of individuals would rather have a sickly or dead child than risk autism, all because of an irresponsible doctor who pushed that the preservatives in vaccinations might cause autism in a minority of children — if only to make a name for himself and followed up by the vapid ramblings of a pinup star (no shame in that, but this is NOT someone we should take seriously when performing scientific or medical inquiry) claiming that her child was autistic because of the MMR vaccine. I refuse to identify either, because giving their names might lend credibility. My blog, my rules.

 

The rise in autism has less to do with the increase in vaccinations and more to do with environmental factors such as industrial pollutants, exposure to toxins and chemicals by the parents and the fetus in utero, exposure to microplastics in utero, genetic traits, and expanded diagnostic criteria. Many people are on the Autistic Spectrum and go on to live fruitful and happy lives. Only the most profoundly autistic require lifelong care, and that is a tragedy, but early intervention is key.

 

Long and short of it, get your kid vaccinated. If they are going to be autistic, that started in the womb and even more likely in the genetic memory. Engage with them, spend time with them, help them develop their social skills, and if they do indeed fall on the ASD, please seek interventions. Do not subject us, especially those of us with immunodeficiencies, to diseases that have no business having a resurgence in the modern world.

 

I have gone off a bit on a tangent, but the point is — do not let pseudoscience fool you into making poor decisions. It is my belief that many people shrink from true science because it is malleable and difficult. Science needs to be revised periodically as observations change and new data is gathered. People tend to like things that are hard and fast, and unchanging. There is no greater offense to many minds than change. Science is difficult. Science is hard. It requires patience and so much failure before you come to the right conclusion. And even then, you may need to revise that conclusion. That scares people. Pseudoscience offers easy answers and certainty.

 

A bit about myself, I am not scientist in the truest sense of the word. I have a degree in Computer Science from Oregon State University and I have a liberal arts degree from the University of California, at Berkeley. I understand scientific principles and the scientific method, but my field deals more in a specialized form of mathematics than ‘hard’ science such as physics, biology, or chemistry. Still, I am fascinated with these subjects and voraciously study up on them, even if I have no intention of pursuing higher education in any area beyond Computer Science. However, I do work that requires soundness and proofs of correctness. My ‘science’ is writing and creating algorithms, and writing software. I am an aspiring software engineer, finding mixed success, but always pushing forward and trying to learn more.

 

My day job generally involves building applications for people so they can run their businesses more efficiently or testing software given to me. My evenings are generally preoccupied with my home life, but I do take time to study algorithms and new coding languages to keep my skills sharp. It is a path often filled with doubt and dead ends. I am often stymied and frustrated that I am not learning as quickly as I feel I should. And that is fine. It is normal.

 

Now that introductions are made and you know a bit more about me, let’s get down to brass tacks. A lot of the focus of this blog is going to be on principles of computer science, because the best way to really hammer something down is to teach others about it, and my musings over various developments in the scientific community and my own experiences as a software engineer.

 

I will attempt, on a weekly or bi-weekly basis to write an article or two on something computer science and something else going on in the world to help break it down to the basics that most people can understand. It is my goal to improve scientific literacy and remove a lot of the scary mystique that exists out there about science and scientific inquiry.