Programmer's Blues

Monday, March 16, 2020

Dear Blogspot,

The other day I was linked to a forum discussion on the correct way to reverse the Sign of an integer (e.g., make -5 into 5, or 5 into -5). I was so enthralled by the discussion that I decided to write my own solution:

public static int reverseSign(int x) { /* x is the input number*/ String srcNum = Integer.toString(x); int srcNumIndex = 0;
int startOfNumber = -1; for(int i=srcNum.length()-1;i>=0;i--) { if((srcNum.charAt(i)=='-') && (i < srcNum.length()-1))
{ startOfNumber=i+1; srcNumIndex++; break;} }
if((srcNum.length() == 1) && (srcNum.charAt(0)=='0')) { startOfNumber = 0; }
char newChars[] = new char[srcNum.length() - startOfNumber]; int newCharIndex = 0; if(startOfNumber < 0) { newChars[0]='-'; newCharIndex = newCharIndex+1; }
for(int i = srcNumIndex; i < srcNum.length();i++) { if(newCharIndex < newChars.length) { newChars[newCharIndex] = srcNum.charAt(i); newCharIndex = newCharIndex+1; } }
String finalString = new String(newChars); return Integer.parseInt(finalString); }

(C++ version available upon request)

An alternative algorithm for this function might look like:

public static int reverseSign(int x)
{
return -x;
}

So why did I write that first monstrosity? Before you think I'm crazy Blogspot, hear me out.

Programmers must be natural economizers. Despite the great strides in CPU speed, memory, and storage capacities, they remain finite resources. Since economics is, among other things, the study of human behavior in the presence of finite goods, it seems fitting that programmers think carefully about how they utilize these resources.

Wednesday, June 21, 2017

Sorry for the low blogging throughput...

Dear Blogspot,

While working the other day, I was watching the throughput meter in a downloader/patcher as it moved about a gig of data from the internet to my hard drive. The meter was showing me how many bytes per second I was downloading. There was also a wonderful little widget for limiting the throughput. I could choose options from 64 kilobytes per second all the way up to 4 megabytes and Unlimited. It is this widget and how it works that I'd like to tell you about.

I'm sure the algorithm used to produce the above graph is more clever, but the most straight-forward network rate limiter basically works by:

1. Observing a maximum throughput (how many bytes have come in lately?)

2. Extrapolating future throughput over a set time period (at this rate, how many bytes will I download by the time 1 minute has passed?).

3. Pausing, by simply refusing to read, the incoming traffic until new extrapolations of the future throughput match the desired throughput.

4. Repeat

For example, suppose I cannot help but eat one peanut per second, for an expected maximum peanut throughput of 60 peanuts per minute, but I only want to eat 20 peanuts per minute. The way to accomplish this would be to tape my mouth shut and bind my hands for 2 full seconds for every 3 seconds available, so that only once every 3 seconds am I actually able to eat a peanut. Since my normal eating rate is one peanut per second, by preventing me from getting at the peanuts 2/3 of the time, I have successfully reduced my peanut consumption throughput to 20/minute.

If you were to graph my throughput-limited peanut-eating, it would look like this:

Peanuts are a good analogy because, like network downloads, the receiver has little to no control over the size of each incoming package of data. This limits the receiver (or peanut eater) to either consuming or not-consuming. Partial-consuming is not really an option. This leads to very jagged graphs, as you see above, because consumption is always followed by deprivation in order to control the overall throughput. And even if one could somehow eat 1/3 of a peanut per second (try it, it's not easy), that would definitely not work for download data, since there is no way to ask a transmitter to only send a partial byte per packet.

So, what has surprised me in the past is how alarming a graph like the above can be to folks. Why can't an accurate graph of limited throughput be smooth? When I want to limit my rate of walking, I don't walk normally and then pause every second before continuing to walk for another second. I just walk slower, in one fluid motion. Why can't the graph be like that?

Well, the graph can certainly LOOK like that, if one, for example, graphs the median throughput over a period instead of the actual bytes or peanuts consumed during each atomic unit of time. But that wouldn't quite be the truth about what is going on. It would be covering up the mechanics of reality with statistical lies. It would be pretending that something happened during an atomic unit of time that really didn't happen at ALL. I once wasted an hour trying to explain this to some folks, and that didn't go well. They just wanted the data to be slower, and not pause all jaggedy like that.

The size of each package of incoming peanut or data, and the limited ability of the receiver in how they can react to incoming peanuts and data, dictates both the algorithm one must use to limit data, and the way an accurate graph of data throughput over time would look.

All that said, my advice is this: screw accuracy; don't try to explain or show any of this to anyone. Just use a smooth-averaging algorithm to hide the jagged edges of reality when making your throughput graphs. It's much easier to do this than to try to explain why those toothy fangs are more accurate. Besides, the graph is meant to convey a general rate of data, not provide a window into actual data movement along your i/o bus.

Tuesday, December 18, 2012

The Risk of Bugs

Dear Blogspot,

I've lately been thinking about how economists view risk, and how that analysis can help us think about software testing.

Economists refer to risk in a rather ordinary way, such as talking about the risk of getting into a car wreck, the risk of a particular stove causing a fire, the risk of being killed by Dracula, or the risk of your house being flooded. A "risk" then is some unhappy event that we wish to avoid, since we find it in our interests to be happier rather than more unhappy. Risk-mitigation is an economic "good", something we value and pursue.

Economists approach choices in risk-mitigation the same way they approach their analysis of choice regarding any economic good -- subjectively and marginally. Those all-important economic terms mean that each person might have a different tolerance level for risk (the subjective part), and that we weigh each risk-mitigation opportunity according to the cost of obtaining it (marginal cost).

Because of this, economics rarely predicts a zero-risk situation where people are involved. There are many reasons why. The first is a knowledge problem: every possible risk is not instantly obvious. The world is a complex place, and lots of unpredictable things happen. The second reason was alluded to earlier, that some risks are just too costly to eliminate given our subjective preferences. For example, we might worry about the chance of a meteor falling through our ceiling and striking us on the head while we sleep, and calculate that the only way to be completely safe is to live deep in the roots of a tall mountain. The last reason is that the desire for safety from risk is unbounded. For example, if we get ourselves a new robot, and find that it is homicidal, we might worry a lot about being killed and have it reprogrammed with the "Three Laws" to prevent that. Afterwards, however, we might start worrying about it breaking our dishes while trying to clean them, and have its metal hands padded. After that, we might eventually even worry about running into it in the dark and stubbing our toe, and so forth, and so on.

As you might have guessed, this exact same analysis can be applied to flaws in the implementation of a specific software design, commonly called a "bug". A user encountering the effects of a bug is definitely an event that they wish to avoid, and (for various reasons) that we as engineers do not wish to introduce. Software houses typically employ QA to try and discover these flaws before a user finds them, so that they can be fixed. Other mitigation practices, such as Test Driven Development (TDD), unit tests, test automation, and so forth are also used to prevent users from encountering bugs in their software.

However, blogspot, I would argue that pursuing a zero-bug situation is neither wise nor practical, for all the same reasons mentioned previously. Modern software systems are complex interactions of many different systems, and user interactions with software can introduce further uncertainty, making the task of even predicting every possible bug all but impossible. Further, sometimes a bug is just too expensive to fix. You may get a bug that the software is unable to handle data buffers larger than any reasonably available storage devices, which is something no reasonable engineer would fix. On the other hand, we may discover a bug that has a very low impact on the user, and low chance of occurrence, but discover that it will take hundreds of man-hours of engineering to correct it, making such correction too costly to pursue. An example of this might be that your software is vulnerable to a flaw in a specific model of video card that changes the visible shade of red slightly, and that the cost of writing code to detect and work around this hardware flaw isn't worth a slight color difference. Lastly, the desire to be free of bugs is irrationally unbounded. A quickly fixed bug that causes the software to crash on boot-up is an easy call, but eventually one would start worrying about software flaws that occur only when in certain rare hardware environments -- it would never end.

Well, blogspot, that is all I have to say on this subject. As in my other posts, I would encourage my fellow engineers to think about our problems in the same way we actually end up making them: by weighing the costs and the benefits of each choice. Demanding a zero-bug software product may make one seem bold and principled, but is only a way to set oneself up for disappointment, while simultaneously foregoing time that could have been spent on new products or more important improvements to existing ones.

Sunday, July 22, 2012

The Law of Comparative Testing

Dear Blogspot,

Two hundred years ago, David Ricardo taught the world about the Law of Comparative Advantage. It states that any two nations, groups, or persons who have differing relative productivity in the production of different goods always benefit from engaging in their most relatively productive activities, and then trading the produce thereof with each other. It is a direct challenge to protectionism on a national scale, and self-sufficiency on a personal one.

This principle applies to the software world rather clearly when one considers the role of Developer and Tester (usually called Quality Assurance or "QA"). To explain both what the Law is, and how it applies to our subject, consider the following example:

Suppose a Developer is capable of producing 10 units of application code every day, or capable of reproducing 9 bugs per day. Suppose further that a Tester is capable of producing 1 unit of application code every day, or capable of reproducing 5 bugs per day. No matter who finds the bug, the Developer will have to produce more code to fix it, but we still generally consider Found bugs to be a good worth pursuing. Now consider all the possible scenarios:

The Developer and the Tester only spend their time reproducing bugs:

0 units of application code is written, and 14 bugs are found every day (but found in what?).

The Developer and the Tester only write code:

11 units of application code is written, and 0 bugs are reproduced (wheeee!).

The Developer only reproduces bugs, the Tester only writes code:

1 unit of code is written, 9 bugs are reproduced. (That is the safest program never released!)

The Developer and the Tester write code half the time, and find bugs half the time:

5.5 units of code written, 7 bugs found. (Serious context switching going on here)

The Developer writes code full time, the Tester finds bugs full time:

10 units of application code is written, 5 bugs are found and reproduced.

The Developer writes code and tests half the time, while the Tester tests full time:

5 units of code written, 9.5 bugs found.

As you can see, the mixture of activities that produces the most code and the most found bugs is number 5, which should also be what any actual developer and tester should intuitively conclude. Note that this is despite the fact that, in the example, the Developer is better at both writing code AND finding bugs! Many would dispute the productivity in finding bugs that I have assigned to my Developer, pointing out that "clear" or "white" box testing is generally inferior to black box, as knowledge of inner-workings can bias the production of test cases. Fair enough, but that would only make the case more stark in favor of testing by Testers.

Another adjustment you might make to the example would be to calculate productivity not by units of time, but by dollar units in costs. Since Software Development is, in the current market, a more scarce skill and in higher demand than Testing, it tends to cost more. If the same test were run using dollar units, the results would lean even heavier towards the employment of Testers for testing in Software Engineering.

So, Blogspot, why would such an obvious argument even need to made? Well, just as protectionism and self-sufficiency arguments crop up all the time in debates about international trade (witness the witless discussion of U.S. Olympic Team uniforms in 2012), they are starting to creep into discussions about methodology in Software Engineering as well. And for the same reasons, these discussions should be nipped in the bud, before it starts to mean less and more expensive software on the market for the rest of us.

Thursday, April 12, 2012

The Political Economy of Software

Dear Blogspot,

In the parlance of economics, "Rent" is defined as money one makes on some product or service above and beyond the cost of producing it. Importantly, it refers to money one makes on something that is not destroyed during consumption, and where true ownership does not change hands, so that, having rented it to one person, it can be rented again. It seems to me Blogspot, that software is clearly an area where the money is clearly made through rents.

In subscription based software services, this point is pretty clear. A consumer pays a periodic cost for access to the software, the value of which disappears after the service contract ends. Traditional software also fits this bill. Since the marginal production cost of producing the next unit of software is zero, the software per-se can be thought of as a single good. From this perspective, we see that once "sold" to one person, an identical unit can be "sold again". The important thing, however, is that, at some point, the money made on marginal units is over and above the production cost, which makes it "rent".

But how does this affect the political economy of software production houses? Suppose a software company endeavors to produce one product, without any updates (yes, no place would do that, due to the increased sales and rents that come with making the update, but stay with me anyway). During production of that software product, the productivity of the designers, developers, QA, etc. is vital. They and their knowledge constitute the true capital of the company, and embody its value in future sales. Once the product is released, however, this productivity and knowledge offer diminishing returns. In a perfectly fluid market where every employee makes their local marginal utility at all times, salaries of these workers would drop, and some, perhaps all, would be laid off. In fact, were it not for the fact that knowledge gained during development constitutes important capital going into an update release, we could imagine the entire development profession being made up of temporary contractors.

Given our example, however, after release, the software house almost immediately becomes a building full of rent-seekers. Developers would immediately change their goals from productive software outcomes to that of convincing the owner that he or she warrants a cut of the continued sales of the program they produced (for which time they've already been paid). This is pure politics and shenanigans.

Now, stepping outside the thought experiment back into the real world, Blogspot, my point becomes this: Even if software houses DO plan updates, or DO offer subscription services, to what degree do these phenomenon show up anyway? Just because the marginal value of continued development is not Zero does not mean that it approaches the value of the work done to produce the first release. And to that degree, one would expect more rent-seeking behavior out of developers than before. The difference in expected productivity might change the nature of the behavior from (in the Zero value case) one of complete fantasy to (in the Normal case) one of exaggerated claims. However, the behavior would still be there.

How do those whose salaries are already tied to current and future sales (investors, owners, perhaps upper management) respond to this realization? Do they realize it at all? That I often wonder.

PostScript:
Of all my letters to you Blogspot, this one seemed the most "out there" to me. Well, maybe not .. See page 23 of the leaked Valve employee manual:

The relevant quote is on page 17:
"Valve is not averse to all organizational structure—it crops up in many forms all the time, temporarily. But
problems show up when hierarchy or codified divisions of labor either haven’t been created by the group’s members or when those structures persist for long periods of time. We believe those structures inevitably begin to serve their own needs rather than those of Valve’s customers. The hierarchy will begin to reinforce its own structure by hiring people who fit its shape, adding people to fill subordinate support roles. Its members are also incented to engage in rent-seeking behaviors that take advantage of the power structure rather than focusing on simply delivering value to customers."

Wednesday, July 13, 2011

Concentration Matters

Dear Blogspot,

I think I can speak for many programmers when I say that Concentration Matters... a lot. When coding, we often have to wrap our minds around some long train of data marshaling, subtle bits of transformation broken into discrete functions, and balancing the side effects of doing something in one process on some other process. This requires our brains to bring largish amounts of information into the fore of our mind and KEEP it there while we code, so that our inputs can be have a proper coarse plotted towards the desired output despite the complex set of shoals ahead. This act of keeping this broader information in mind, steering the line of code being written at the moment, while plotting the coarse immediately ahead is what I call "concentration". It requires a degree of patience and discipline. A flighty mind need not apply.

Imagine a widget machine that requires a long warmup, but slowly becomes more productive at generating widgets of higher quality over time as it reaches its peak rate in both speed and quality. Now imagine some idiot coming along and turning it off and on every 20 minutes. Welcome to the real world of software development.

For developers, these breaks of concentration take many forms: meetings (especially the impromptu sort), questions, favors, or even social visits. It is especially bad when the interruption comes from another developer who, in trying to maintain his own train of thought, interrupts another developer with a question. As an aside, I would not count developer initiated events such as web surfing, or writing blog entries in this list, since the concentration break has inner costs that sufficiently incentivizes against it. Moreover, it can be hard for non-developers to grasp this problem sufficiently to appreciate it. In my experience, few tasks in software engineering outside of writing code requires a level of concentration that makes it a serious matter to afford no mental breaks.

The inner costs I mentioned creates perverse incentives that adds to the general loss of productivity. After all, if you know you will have to bear the inner cost, and you know your productivity will be lower, why turn the "coding machine" back on at all? Better to wait until you have reasonable expectation of some period of quiet. For those of us who find the inner costs of the interruptions most jarring, better yet to work after everyone else has gone home. (Is this why so many programmers are also "night owls"? )

Occasionally, management will see the wisdom of this insight and take steps to mitigate the damage in order to increase productivity. Instituting "No meeting days", or "No interruption" periods during the day are interesting, if failed, attempts. Other members of the overall team, from QA to management all require information to effectively do their jobs; information that is inconveniently concentrated in the mind of the developer. In other words, attempts to fix this problem will invariably affect the productivity of others and the company overall. By how much? Who knows! Nobody bothers to measure.

So, Blogspot, once again I've proven a useless whiny writer, contemplating problems for which I have no proffered solution, except that these matters should be measured, and a degree of mitigation proper to its overall effectiveness instituted.

I would get right on that, but everyone is headed out to lunch, and I've work I can do now...

Monday, September 20, 2010

The Latest Development Fads...

Hey Blogspot,

Sorry I haven't written in awhile -- I'll have something new soon. In the meantime, I had this random observation today: given that time is money (especially in the software development business), would you Bank on anecdotal evidence?

When the next new technique rolls around, will the talk in the office be about the peer-reviewed studies showing its productivity benefits, or will it all start with "has anyone heard or tried out X"?

- Bo