Staying Safe Online

I have seen a few “staying safe online” guides lately. I wrote one of my own a while back after some of my friends were threatened online and got worried. This guide should be a good starting point for most common casual internet users. It’s important to remember that no matter what you do if it’s online, it can be hacked.

  • Never reuse passwords
    • Some websites are easier to hack than others
    • Hackers will try the same password on other websites
    • Use 1Password for easier management
  • Don’t use simple passwords
    • Hackers guess passwords all the time
    • There are easy automatic tools that enumerate all password options
    • Don’t use your name, birthday, SSN, or any public information in passwords
  • Keep your computer & phone up-to-date
    • Old software has known and easily exploitable vulnerabilities
  • Never click links in emails
    • Clicking the wrong link can give control of your accounts to hackers
    • Manually browse to the website even if the email looks legit
  • Always logout on public computers
    • Preferably never login on public computers in the first place
    • Data can be linger even after logging out
    • Some public computers record your passwords
  • If it was put online, it will stay online
  • Any private information shared can help hacking
    • Your name and birth year can be enough to guess your SSN

Securing Facebook

  • Click the little lock icon on top and follow instructions
    • Set everything to private
    • Hide your birth year
  • Click the little triangle on the top right and choose Settings
    • Enable login alerts to be notified of hacks
    • Enable login approvals
    • Enable trusted contacts in case your account is hacked

Securing Google Account

Stale MapReduce Staging Directories

I had a problem where HDFS would fill up really fast on my small test cluster. Using hdfs dfs -du I was able to track it down to the MapReduce staging directory under /user/root/.staging. For some reason, it wasn’t always deleting some old job directories. I wasn’t sure why this kept happening on multiple clusters, but I had to come up with a quick workaround. I created a small Python script that lists all staging directories and removes any of them not belonging to a currently running job. The script runs from cron and I can now use my cluster without worrying it’s going to run out of space.

This script is pretty slow and it’s probably possible to make it way faster with Snakebite or even some Java code. That being said, for daily or even hourly clean-up, this script is good enough.

import os
import re
import subprocess

all_jobs_raw = subprocess.check_output(
  'mapred job -list all'.split())
running_jobs = re.findall(
  all_jobs_raw, re.M)

staging_raw = subprocess.check_output(
  'hdfs dfs -ls /user/root/.staging'.split())
staging_dirs = re.findall(
  staging_raw, re.M)

stale_staging_dirs = set(staging_dirs) - set(running_jobs)

for stale_dir in stale_staging_dirs:
    'hdfs dfs -rm -r -f -skipTrash ' +
    '/user/root/.staging/%s' % stale_dir)

The script requires at least Python 2.7 and was tested with Hadoop 2.0.0-cdh4.5.0.

Download PDB by GUID

Sometimes you get stuck with a broken or no dump at all. You know what you’re looking for but WinDBG just keeps refusing to load symbols as you continue to beg for mercy from the all knowing deities of Debugging Tools for Windows. You know what PDB you’re looking for but it just wouldn’t load. The only thing you do know is that you don’t want to go digging for that specific version of your product in the bug report and build a whole setup for it just so you can get the PDB. For those special times, some WinDBG coercion goes a long way.

To download the PDB create a comma separated manifest file with 3 columns for each row. The columns are the requested PDB name, its GUID plus age for a total of 33 characters and the number 1. Finally call symchk and pass the path to the manifest file with the /im command line switch. Use the /v command line switch to get the download path of the PDB.

To demonstrate I’ll use everyone’s favorite debugging sample process.

C:\>echo calc.pdb,E95BB5E08CE640A09C3DBF3DFA3ABCB42,1 > manifest

C:\>symchk /v /im manifest
SYMSRV: Get File Path: /download/symbols/calc.pdb/E95BB5E08CE640A09C3DBF3DFA3ABCB42/calc.pdb
DBGHELP: C:\ProgramData\dbg\sym\calc.pdb\E95BB5E08CE640A09C3DBF3DFA3ABCB42\calc.pdb - OK

SYMCHK: FAILED files = 0

To force load the PDB you need to update the PDB path, turn SYMOPT_LOAD_ANYTHING on, and use the .reload command with /f to force and /i to ignore any so called mismatches.

kd> .sympath C:\ProgramData\dbg\sym\calc.pdb\E95BB5E08CE640A09C3DBF3DFA3ABCB42
kd> .symopt+0x40
kd> .reload /f /i calc.exe=0x00400000

You should now have access to all the data in the PDB file and stack traces should start making sense.

Android LXR

An open source OS makes debugging applications so much easier. Instead of firing up IDA and going through opcodes, you can simply read the code and sometimes even find comments. However, searching through millions of lines of code can be a daunting task. Operation systems usually have a huge codebase and even the simple task of looking for one function can take a few good minutes. After reading that function, you usually want to search for functions it calls or functions that call it to better understand the flow. Those extra searches take time too. A good IDE would solve this issue but it requires downloading and indexing the massive source code first.

LXR was created for this exact reason. It allows hosting a fully indexed copy of the source code. It even makes it easy to publish an index of multiple versions of the source code. Want to compare a certain function between two versions of the Linux kernel? No problem. Want to know which functions use a certain function? Easy. LXR is awesome and fast.

Setting up LXR on your own, however, does take some time and effort. That is why I was happy to find while trying to hunt down a bug in one of my Android applications. It indexes both Android and patched Linux kernel sources for all major versions of Android. It is an invaluable resource every Android developer should know.

I originally had a question about this topic open on StackOverflow with AndroidXref as the accepted answer. It was recently deleted, probably because it didn’t have anything to do with C operator precedence. This is my AndroidXref.SEO++.

Old GDB find

Newer versions of GDB come with the nifty find command. The old version of GDB I have to use does not. It is also incapable of generating a proper stack trace for the platform it supposedly serves. But that’s a whole other story…

Anyway, I found a piece of code that almost does the same. I tweaked it a bit, fixed the stray bug ($x -> %p) and would like to never do it again. So here it is for my future reference and your indulgent.

define find
  set $start = (uint64 *) $arg0
  set $end = $start + $arg1
  set $pattern = (uint64) $arg2
  set $p = $start
  while $p < $end
    if (*(uint64 *) $p) == $pattern
      printf "pattern %p found at %pn", $pattern, $p
    set $p++

Hello Android

Humanoid, search engine and one of the most addictive FPS games ever created walk into a bar. A few refreshing cups of coffee later, a joke is born and its name is MW2 Guide.

I’ve created a pretty simple application for Android that helps Call of Duty: Modern Warfare 2 addicts, such as myself, to make some sense of the bombardment of dialog boxes popping after a match. It’s basically a list of all available callsign titles and their description. What sets it apart from a few dozen similar apps is the quick search ability and auto completion voodoo, in accordance with Android’s search centric vision.

Search for MW2 Guide on the market, or use the QR code below.

MW2 Guide QR code

SCSIPORT debugging

Microsoft provides useful extensions for debugging SCSIPORT drivers in WinDbg. But with some versions of scsiport.sys, the symbol files don’t contain type information. This produces fun errors like the following.

kd> !scsikd.scsiext 8a392a38
***                                                                   ***
***                                                                   ***
***    Your debugger is not using the correct symbols                 ***
***                                                                   ***
***    In order for this command to work properly, your symbol path   ***
***    must point to .pdb files that have full type information.      ***
***                                                                   ***
***    Certain .pdb files (such as the public OS symbols) do not      ***
***    contain the required information.  Contact the group that      ***
***    provided you with these symbols if you need this command to    ***
***    work.                                                          ***
***                                                                   ***
***    Type referenced: scsiport!_DEVICE_OBJECT                       ***
***                                                                   ***
scsikd error (3): ...\storage\kdext\scsikd\scsikd.c @ line 188

This makes the common task of getting your device extension object very daunting. After some digging, I came up with this code to at least get my device extension object from SCSIPORT’s device extension object.

!drvobj mydriver
* get relevant DevObj
!devobj <devobj>
* get DevExt
dt mydriver!MY_DEVICE_EXTENSION poi(<DevExt> + b4)

I’ve only tried it on Windows XP SP3. The offset may be different with other configurations. Anyone knows a better way around this? Preferable method would naturally be making scsikd work.

It’s not a symptom, it’s a feature

Medical science has developed amazing tools to examine the human body over the years. Petri dishes, incubators, and various types of cultures identify infections. Ultrasound, X-rays, CT, MRI, FMRI and PET use different kind of technologies to give doctors a better view and understanding of our inner working. In extreme cases, ever advancing surgery techniques provide hands on approach.

Most of us will only get to use these when seriously ill or seriously rich and paranoid. Despite all those mind-boggling technological innovations, when Joe sick-pack goes to the doctor he gets examined with a stethoscope, wooden stick, thermometer, analog sphygmomanometer and a whole lot of MD fingers. Verbal inquiring is another characteristic instrument doctors wield at medical proficient Joe sixth-pack, who is prone to lies of shame. It seems development of widespread diagnostic equipment available at doctors’ disposal has reached a stall a few decades ago. Funds keep flowing into research for ever more powerful drugs and fancier high-end diagnostic machines, promoting production of solve-all power tools or solutions for high profile diseases affecting only a fraction of the population. When all you have is a broad-spectrum antibiotic and ancient diagnostic equipment, everything looks like a superbug. Ironically, overuse of broad-spectrum antibiotic is a catalyst of superbug evolution. Other risks of antibiotics include side effects and an allergic reaction, yet antibiotic medicine is still one of the most powerful tools available at doctors’ disposal.

Absence of efficient analysis methods leads not only to over reliance on solve-all power tools, but also avoidance of the real issue at hand by both doctors and patients. Falling back to symptom treatment rather than going head to head with the real issue is the simpler choice, especially when facing rudimentary findings that can only be supplemented with extensive and cost-inefficient tests. Such a misinformed treatment could subject the patient to unnecessary side effects and hide the underlying illness by removing its symptoms, allowing it to stride on and mature, reducing chances of early discovery and treatment that can be sometimes save a life. It can also subject the patient to unnecessary dangerous operations where a simpler solution might exist.

It’s easy to blame doctors and the pharmaceutical industry for pushing drugs on unsuspecting patients, but both are just doing their jobs while trying to keep up with overwhelming crowds of sick, aching and impatient masses. Patients get no more than a few minutes each and are handled with archaic diagnostic equipment, forcing workarounds or guess work and hand-offs to busy specialists. It would seem our healers are doing the best they can under the circumstances, developing and distributing powerful drugs that work for most cases while favoring side effects over precise treatment in the name of cost efficiency and large scale medicine.

An analogy can be drawn to the computing world and specifically to debugging. Bug squashing consists of the same steps as illness treatment – discover symptoms, analyze, hypothesize, apply fix, rinse and repeat until symptoms disappear. Quality relevant analytic tools and deep understanding of the code make analysis easier and improve chances of spotting and fixing the bug faster; removing the need for quirky workarounds. Powerful and accessible tools like symbols, windbg, sysinternals, virtual machines, logs facilities, scriptable environments and automated test scenarios shed light over system internals and allow extensive yet concise overview of the issue, quick theory debunking, solutions for common issues, and easy verification of solutions. Imagine how powerful a vital signs logging facility would be at detecting anomalies, how much simpler analysis would be with body part isolation by virtualization, how less stressful it would be if every piece of the human body was marked with an appropriate name regardless of its current location, how enlightening it would be to view processes in a streamlined graph, and how relaxing it would be for the patient to know all is well on the spot instead of waiting for the test results.

Sci-fi inspired whole body scanner with shiny lasers able to detect the issue in a few seconds, fix it with a different color of laser and then make coffee will probably not be invented for a few more centuries, but there’s no need to get carried away. When debugging a system, human or digital, every little diagnostic tool should help. Cheap discrete heart monitor for rhythm irregularities detection, microbiological culture device with the ability to identify common infections with no need for a microscope, common antibodies detector, or a portable x-ray device would all reduce the burden and encourage better solutions overall.

Hopefully, the recent rise in biotechnological studies over the last few years will show its effect soon and shift focus from drugs to widespread diagnostics and narrow treatments.

2012 bug

I recently watched a trailer for the new 2012 movie. It seems like a pretty decent apocalyptic movie written and directed by the same guy behind two other similar movies – Independence Day and The Day After Tomorrow. Famous actors, staggering visual effects and the genre-mandatory destruction of the White House by a ship are all included. It was enough to get me hooked, fully hoping for another immersing experience and the nightmares that will surely follow.

While the movie will probably be a blockbuster and deserves its credit, the concept behind it – an apocalypse occurring on December 21, 2012 predicted by the Mayan; is a misunderstanding at the very least. The Mayan, as any other respectable civilizations, had a calendar of their own to keep track of time. In particular interest is the long count – a cycle of approximately 5129 years. According to it, a day is called k’in. 20 k’ins are one winal. 18 winals are one tun. 20 tuns are one k’atun. 20 k’atuns are one b’ak’atun. Each b’ak’atun is 144,000 days, or approximately 394 years. A long count cycle consists of 13 b’ak’atuns, or 5129 years. Day zero, believed to be the creation day, is August 11, 3114 BC. Mayan math is base-20 and so date can be represented by five digits. In ye olde times that would be five groups of a bunch of stripes and dots listed from top to bottom. To make things more manageable for us modern people, a series of 5 modern numbers separated by dots is used. Today, for example, is That is 12 b’ak’atuns, 19 k’atuns, 16 tuns, 10 winals and 16 k’ins or 5125 years since day zero.

Through various reasoning, certain academic scholars have concluded that at the end of each such cycle comes a grand and possibly cataclysmic event. Mayan scriptures make no direct reference to such an event and my personal belief is that interpretation of 13-b’ak’atun-ia party invitations have gone seriously awry; but an even simpler explanation exists.

Much like modern day engineers, Mayans had to carefully balance versatility and resources and perform a cost-benefit analysis. Understandably, they decided including the long count index in every date would be a waste of resources. Instead of carving six digits, accommodating for multiple long count cycles, they opted for ambiguity by implying the long count cycle. Imagine the vast amount of stone that would have gone to waste should every contract, ticket, advertisement, news paper and document of Mayan times had included another digit, just so it could be valid 5000 years into the future long after they and everything they knew was dead. That brilliant decision probably allowed the construction of another pyramid or two.

In fact, the Mayans are to be admired. When our modern day engineers faced the same challenge, they opted for a century time frame in favor of resources thus unleashing the infamous Y2K bug onto an unsuspecting world. It was believed date ambiguity would cause banks to fail, computers to crash and burn, zombies to overrun the streets and anniversaries to be forgotten thus eliminating any possibility of further human reproduction. Much of the same and more is being predicted for 2012 with the same reasoning. The apocalypse is looming at an arbitrary date due to green and efficient Mayan engineering. But despite widespread usage of technology and date abbreviation in our days, short of a few minor glitches, nothing occurred on January 1, 2000. Considering the 2012 bug concerns ancient technology no longer in use, the idea seems even more absurd.

Therefore, assuming you are not using Mayan computers, live in a mortgaged Mayan pyramid or somehow related to Indiana Jones; you’re welcome to join me for a Mayan themed end-of-the-world movie marathon on, or December 22, 2012.