Learning Ghidra's Scripting

I focus on Apple platforms, but I heavily focus on the Wii for projects in my free time. So when I want to figure out how anything works, I typically immediately throw it at Ghidra. With the sole exception of some Objective-C/Swift handling (in which case I use Hopper alongside!), Ghidra is an insanely robust tool to provide accurate disassembly - and likely decompilation - of any binary, from any platform.

Prologue

We use Ghidra when working on the Wii Shop Channel for the Open Shop Channel. This project aims to repurpose the WSC for homebrew titles. Within the shop, Nintendo provides a rather featureful JavaScript API calling back to EC, their "ECommerce Library" utilized for title management.

While updating our documentation of the available APIs, I noticed an interesting trend. The Wii Shop Channel, similar to other HTML-utilizing channels on the Wii (the Internet Channel, or the System Menu), utilizes Opera's native JS engine. As such, Nintendo developed the concept of JS "plugins", exposing objects by name. This includes things such as wiiKeyboard for keyboard functionality, wiiMii for Mii-related functionality, or wiiNwc24 with WC24-related functions accordingly.

However, for EC's exposed JavaScript functions, Nintendo wrote their plugin API a little differently. EC exposes several objects directly, which I assume required a different architecture to register. Perhaps BroadOn, the company Nintendo contracted most online development to, authored this code. Regardless, consider some slightly doctored C for one of the more simple objects registered, ECCreditCardPayment:

void ECCreditCardPaymentData::__ct(void *unk, void *unk2, int paramCount, void *paramData) {
  if (paramCount == 1) {
    if (paramData->type == TYPE_STRING) {
      // instantiate, copy credit card number
    } else {
      ec::logmsg(EC_LEVEL_ERROR, "ECCreditCardPayment cc number agument 1 must be a string\n");
      // return error
    }
  } else if (paramCount != 0) {
    ec::logmsg(EC_LEVEL_ERROR, "Invalid number of ECCreditCardPayment constructor arguments: %d, must be 0 or 1\n", paramCount);
    // return error
  }
}

Hmm.. so they log "agument" information. This sounds familiar, however - one function called for methods of any EC object has something similar. I call it registerJsFunction. It's called like this within ECommerceInterface:

registerJsFunction(paramCount, paramData, &SOME_SCARY_ARRAY, 0, 1, "syncTickets");

Based on this, we can guess that we're passing the object's parameter count, parameter data, some.. scary array, hardcoded numbers, and the method name. What's going on here? Let's look at the function..

int registerJsFunction(int paramCount, void *paramData, void *someArray, int unk, int unk2, char *name) {
  if ((paramCount < lowerArgCount) || (upperArgCount < paramCount)) {
    if (lowerArgCount == upperArgCount) {
      ec::logmsg(EC_LEVEL_ERROR, "%s has %d arguments but requires %d argumnets\n", name, paramCount, unk);
    } else {
      ec::logmsg(EC_LEVEL_ERROR, "%s has %d arguments but requires %d to %d argumnets\n", name, paramCount, unk, unk2);
    }

    // return error
  }
}

Well, that was simple. It seems unk is the lower count of “argumnets” (does Nintendo need a Grammarly subscription?) and unk2 is the higher count. Insanely simple, all things considered.

Let's delve into someArray for a second, and come back to paramData afterwards.

We can find data similar to the following at SOME_SCARY_ARRAY for our syncTickets example above: Untagged data within Ghidra. Read on to see the matching structure, the actual data is unimportant.

After a string of inferences from other parts of the code, here is what we'll use to represent the structure so far:

struct ExpectedArgument {
  enum WWWJSPType type;
  char* name;
  byte unknown[8];
};

I obtained the name WWWJSPType from a string stating the following for parameter data:

setter_ECProgres "description" ERROR: value->type %d != WWWJSPTypeString %d

In this case, %d was hardcoded to 1. I assume Opera (or Nintendo? or BroadOn?) probably used constants, but an enum is a lot easier for our purposes within Ghidra.

Let's see where else we may be able to expand our JS type enums. Perhaps a JS number? We'll search strings, and... it seems ECAccountPayment has what we're looking for!

if (paramData->type == 2) {
  // [...converts number from floating point to unsigned, snprintfs to number...]
  convertFloatingPointToUnsigned(paramData->data);
  char* convertedNumber = snprintfToNumber();
  
  strncpy(otherData, convertedNumber, 0x20);
} else if (paramData->type == TYPE_STRING) {
  strncpy(otherData, paramData->data, 0x20);
} else {
  ec::logmsg(EC_LEVEL_ERROR, "ECAccountPayment account id agument 1 must be a number or string\n");
  // return error
}

While of course all of this could have been easily verified by me setting breakpoints and observing the parameter data in memory, static analysis felt easier. So now we know another type!

typedef enum WWWJSPType {
  TYPE_STRING = 1,
  TYPE_NUMBER = 2
} WWWJSPType;

This additionally helps us recognize some of the other unknown data in our other example. I'll spare you on the specifics, but I believe that ExpectedArgument has this structure in the end:

struct ExpectedArgument {
  enum WWWJSPType type;
  char* name;
  bool hasSecondType;
  // padding for 3 bytes, compiler aligns to 4 bytes
  enum WWWJSPType secondType;
};

And, I don't know about you, but that feels so, so much more complete.

Our above, more complete, structure applied to the previous data example. Its first type is of string, and its second type is a number.

Let's continue in its logic, our unknowns found...

int registerJsFunction(int paramCount, ParamData *paramData, ExpectedArgument *expectedArgs, int lowerArgCount, int upperArgCount, char *name) {
  // [... validate size, as shown above ...]

  for (int i = 0; (i < upperArgCount && i < paramCount); i++) {
    WWWJSPType expectedType = paramData[i]->type;
    ExpectedArgument* arg = expectedArgs[i];
    if (expectedType != arg->type &&
        (arg->hasSecondType == false || expectedType != arg->secondType)) {
      ec::logmsg(EC_LEVEL_ERROR, "%s has invalid type %d for arg %d: %s\n", i+1, arg->name);
      // return error
    }
    
    // [...]
  }

  // [...]
}

Perfect. :)

Ghidra Scripting

Ideally, it would be nice to now recurse through all calls of registerJsFunction to parse their arguments and types. While I could do it manually, that sounds like effort. I'm in the mood to learn as well, so let's investigate Ghidra's scripting functionality!

I quickly learned that Ghidra has a rich scripting API available in two forms: a Java-based script API, and a Python API, heavily modeled after Java. For the purpose of this, we're going to focus on their Java API directly, simply because it's new and we can! (Truthfully there is little difference, as the Python API nearly exposes the Java API 1:1.)

I opened up the Script Manager and create a new Java script, which I'll call ReadJSFuncArgs.java. We're greeted with basic, yet detailed, boilerplate:

//Iterates through all calls to registerJsFunction and
//outputs usable HTML for GitBook.
//@author Spotlight
//@category Wii
//@keybinding 
//@menupath 
//@toolbar 

import ghidra.app.script.GhidraScript;
import ghidra.program.model.mem.*;
import ghidra.program.model.lang.*;
import ghidra.program.model.pcode.*;
import ghidra.program.model.util.*;
import ghidra.program.model.reloc.*;
import ghidra.program.model.data.*;
import ghidra.program.model.block.*;
import ghidra.program.model.symbol.*;
import ghidra.program.model.scalar.*;
import ghidra.program.model.listing.*;
import ghidra.program.model.address.*;

public class ReadJSFuncArgs extends GhidraScript {
    public void run() throws Exception {
      // TODO Add User Code Here
    }
}

Works perfectly! From now on, all Java snippets will be within run.

registerJsFunction is present at 0x80091620 in our binary. We can easily obtain all references with a few lines of code:

Address registerFunc = toAddr(0x80091620);
Reference[] refs = getReferencesTo(registerFunc);
for (Reference ref: refs) {
  println(ref.toString());
}

From here, I wanted to see if I could somehow determine function parameter values within the calling functions, as Ghidra does.

Thankfully, Nintendo (or... the compiler they chose?) follows the SVR4 calling ABI. We're assured that GPR3-8 are utilized for calling:

We can ignore paramCount and paramData, as they're set at runtime. Thankfully, based on a manual analysis of all 62 functions, expectedArgs is null when the lower and upper argument counts are both zero.

Ghidra's API provides a way to make determining these values even easier. Instead of manually scraping register values - my initial, unwieldy approach - we can:

We can start by relatively easily finding all calling functions:

List<Function> callers = new ArrayList<Function>();

Reference[] refs = getReferencesTo(registerFunc);
for (Reference ref: refs) {
  Address callingAddress = ref.getFromAddress();
  Function callee = getFunctionContaining(callingAddress);
  callers.add(callee);
}

// decompile

From there, we can decompile the function.

When decompilation is complete, we're given an iterator over PcodeOps. This special type provides metadata about every instruction. The Javadoc describes it as microcode, of sorts, for the given Ghidra language. Insanely useful!

We can ensure the current op is a PcodeOp.CALL and that its zeroth input - the calling function's address - is 0x80091620, registerJsFunction. Two bullet points, one to go!

Every input on a PcodeOp is a Varnode, permitting reading of the supplied call data.

If the zeroth argument is an address to our called function, we know that the first and second functions are paramCount and paramData. We can easily obtain further:

Varnode expectedArgs = op.getInput(3);
Varnode lowerArgCount = op.getInput(4);
Varnode upperArgCount = op.getInput(5);
Varnode name = op.getInput(6);

We can then call getOffset() on these variables to get their offset. Curiously - contrary to what I expected - the offset of set variable amounts is their value, i.e. a lowerArgCount of 6 has an offset of 6. At least it makes life easier, I think!

From here on, we apply the magic of a function named traceVarnodeValue. To be completely honest, I am not 100% certain of what it does - I believe it follows our provided inputs and somehow determines a value from there. But it works, yeah? 😅

I additionally wrote a function named getString, which utilizes the Varnode's value to obtain the Data at that address and reads it as a string:

String getString(Varnode node) throws Exception {
  Address value = traceVarnodeValue(node);
  Data data = getDataAt(value);
  return (String)data.getValue();
}

This works perfectly!

We can put together a simple program to script reading arguments:

// We can later access memory at this offset to determine our values.
Address expectedArgs = traceVarnodeValue(op.getInput(3));

// Determine calling count.
long lowerArgCount = op.getInput(4).getOffset();
long upperArgCount = op.getInput(5).getOffset();

// Determine the function's name.
String functionName = getString(op.getInput(6));

// Our expectedArgs pointer will be null, and lower/upper args are both zero for arg-less functions.
if (expectedArgs.getOffset() == 0 && lowerArgCount == 0 && upperArgCount == 0) {
  printf("Name %s\n", functionName);
} else {
  printf("Name %s: lower %d, upper %d, expected arguments @ %s\n", functionName, lowerArgCount, upperArgCount, expectedArgs);
}
Example output
Name trace: lower 0, upper 1, expected arguments @ 80324af8
Name getProgress
Name cancelOperation
Name purchaseTitle: lower 5, upper 9, expected arguments @ 80325090
Name purchaseGiftTitle: lower 7, upper 11, expected arguments @ 80325208
Name acceptGiftTitle: lower 3, upper 5, expected arguments @ 803253e8
Name syncTickets: lower 0, upper 1, expected arguments @ 80325470
Name checkDeviceStatus
Name refreshCachedBalance
Name purchasePoints: lower 4, upper 8, expected arguments @ 80325528
Name downloadTitle: lower 1, upper 1, expected arguments @ 803256a0
Name checkFirmware: lower 1, upper 1, expected arguments @ 803256c8
Name generateKeyPair
Name confirmKeyPair
Name checkRegistration
Name register: lower 0, upper 3, expected arguments @ 80325798
Name unregister: lower 0, upper 1, expected arguments @ 80325840
Name transfer: lower 0, upper 1, expected arguments @ 80325898
Name syncRegistration: lower 0, upper 1, expected arguments @ 80325910
Name deleteOwnership: lower 2, upper 2, expected arguments @ 80325980
Name sendChallengeReq
Name getChallengeResp
Name reportCSS: lower 6, upper 7, expected arguments @ 80325a78
Name confirmCSS: lower 7, upper 7, expected arguments @ 80325b58
Name getCSSConfirmation
Name getTitleInfo: lower 1, upper 1, expected arguments @ 80325c80
Name getTitleInfos
Name getTicketInfos: lower 1, upper 1, expected arguments @ 80325d18
Name getDeviceInfo
Name getCachedBalance
Name getPurchaseInfo
Name getTransactionInfos
Name checkParentalControlPassword: lower 1, upper 1, expected arguments @ 80325e60
Name setLanguage: lower 1, upper 1, expected arguments @ 80325ea8
Name setCountry: lower 1, upper 1, expected arguments @ 80325ee0
Name setRegion: lower 1, upper 1, expected arguments @ 80325f10
Name setAge: lower 1, upper 1, expected arguments @ 80325f40
Name setAccountId: lower 2, upper 2, expected arguments @ 80325f80
Name setWebSvcUrls: lower 0, upper 3, expected arguments @ 80325ff0
Name setContentUrls: lower 0, upper 2, expected arguments @ 803260d0
Name deleteTitleContent: lower 1, upper 1, expected arguments @ 80326120
Name deleteTitle: lower 1, upper 1, expected arguments @ 80326158
Name deleteLocalTicket: lower 2, upper 2, expected arguments @ 80326198
Name launchTitle: lower 2, upper 2, expected arguments @ 80326208
Name request: lower 1, upper 1, expected arguments @ 80326268
Name getSessionValue: lower 1, upper 1, expected arguments @ 803262b8
Name setSessionValue: lower 1, upper 2, expected arguments @ 803262c8
Name getPersistentValue: lower 1, upper 1, expected arguments @ 803263e0
Name setPersistentValue: lower 1, upper 2, expected arguments @ 803263f0
Name getWeakToken
Name pubKeyEncrypt: lower 1, upper 1, expected arguments @ 80326418
Name setOption: lower 2, upper 2, expected arguments @ 80326440
Name startLog: lower 1, upper 1, expected arguments @ 803264a0
Name getLog
Name stopLog
Name runTests
Name titleLimits.get: lower 1, upper 1, expected arguments @ 803268f0
Name titleInfos.get: lower 1, upper 1, expected arguments @ 80326b80
Name ticketInfos.get: lower 1, upper 1, expected arguments @ 80326ca0
Name transactionInfos.get: lower 1, upper 1, expected arguments @ 80326f90
Name getVersion
Name setParameter: lower 1, upper 2, expected arguments @ 80324fa0

Reading our expected arguments from the structure

While the above looks far more like what we would prefer, we now need to read our expected arguments. It appears we would work with the Data type once more. However, I am giving up.

Conclusion

I started this, knowing nothing about Ghidra's API. I realize now it's extraordinarily powerful and developed.

It also would have been easier if I had done this by hand. I finished it in about an hour manually. Perhaps some day I will find a faster way to iterate through types.

Please find my work-in-progress source code as follows:

ReadJSFuncArgs.java
//Iterates through all calls to registerJsFunction and
//outputs Markdown.
//@author Spotlight
//@category Wii
//@keybinding 
//@menupath 
//@toolbar 

import ghidra.app.script.GhidraScript;
import ghidra.program.model.mem.*;
import ghidra.program.model.lang.*;
import ghidra.program.model.pcode.*;
import ghidra.program.model.util.*;
import ghidra.program.model.reloc.*;
import ghidra.program.model.data.*;
import ghidra.program.model.block.*;
import ghidra.program.model.symbol.*;
import ghidra.program.model.scalar.*;
import ghidra.program.model.listing.*;
import ghidra.program.model.address.*;

import ghidra.util.task.ConsoleTaskMonitor;
import ghidra.app.decompiler.DecompInterface;
import ghidra.app.decompiler.DecompileResults;

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

public class ReadJSFuncArgs extends GhidraScript {
  public void run() throws Exception {
    // We hardcode 0x80091620 as it must not change.
    // This is the address of "registerJsFunc" in v21 of the Wii Shop Channel.
    Address registerFunc = toAddr(0x80091620);

    // We need to keep track of all invoking functions, as determined from our callers.
    List<Function> callers = new ArrayList<Function>();

    // Determine calling functions.
    Reference[] refs = getReferencesTo(registerFunc);
    for (Reference ref: refs) {
      Address callingAddress = ref.getFromAddress();
      Function callee = getFunctionContaining(callingAddress);
      callers.add(callee);
    }

    // Necessary for decompilation.
    ConsoleTaskMonitor monitor = new ConsoleTaskMonitor();

    DecompInterface ifc = new DecompInterface();
    ifc.toggleCCode(false);
    ifc.openProgram(currentProgram);

    // Decompile all calling functions.
    for (Function caller: callers) {
      DecompileResults res = ifc.decompileFunction(caller, 0, monitor);
    
      // Ensure no errors occurred
      if (!res.decompileCompleted()) {
        println(res.getErrorMessage());
        return;
      }

      HighFunction current = res.getHighFunction();
      Iterator<PcodeOpAST> funcOps = current.getPcodeOps();
      while (funcOps.hasNext()) {
        PcodeOpAST op = funcOps.next();

        // We only want to handle CALL pcodes.
        if (op.getOpcode() != PcodeOp.CALL) {
          continue;
        }

        // We only want to handle calls to our registration function.
        if (!op.getInput(0).getAddress().equals(registerFunc)) {
          continue;
        }
        
        handleCallArgs(op);
      }
    }
  }

  void handleCallArgs(PcodeOp op) throws Exception {
    // 0 holds the calling addres
    // 1 holds `paramCount`, nullable
    // 2 holds `paramData`, nullable
    // 3 holds `expectedArgs`, nullable
    // 4 holds `lowerArgCount`
    // 5 holds `upperArgCount`
    // 6 holds `name`


    // We can later access memory at this offset to determine our values.
    Address expectedArgs = traceVarnodeValue(op.getInput(3));

    // Determine calling count.
    long lowerArgCount = op.getInput(4).getOffset();
    long upperArgCount = op.getInput(5).getOffset();

    // Determine the function's name.
    String functionName = getString(op.getInput(6));

    // Our expectedArgs pointer will be null, and lower/upper args are both zero for arg-less functions.
    if (expectedArgs.getOffset() == 0 && lowerArgCount == 0 && upperArgCount == 0) {
      printf("Name %s\n", functionName);
      return;
    }

    printf("Name %s: lower %d, upper %d, expected arguments @ %s\n", functionName, lowerArgCount, upperArgCount, expectedArgs);

    println(expectedArgs + "");
    Data arguments = getDataAt(expectedArgs);
    println(arguments.getDataType().getName());
  }

  // I found this function while scouring about online.
  // I have absolutely no clue how it works - presumably just following the pointer?
  // I.. have no clue. I'm sorry :(
  // It works, at least
  private Address traceVarnodeValue(Varnode argument) throws IllegalArgumentException {
    while (!argument.isConstant()) {
      PcodeOp ins = argument.getDef();
      if (ins == null)
        break;
      switch (ins.getOpcode()) {
      case PcodeOp.CAST:
      case PcodeOp.COPY:
        argument = ins.getInput(0);
        break;
      case PcodeOp.PTRSUB:
      case PcodeOp.PTRADD:
        argument = ins.getInput(1);
        break;
      case PcodeOp.INT_MULT:
      case PcodeOp.MULTIEQUAL:
        return Address.NO_ADDRESS;
      default:
        throw new IllegalArgumentException(String.format("Unknown opcode %s for variable copy at %08X",
            ins.getMnemonic(), argument.getAddress().getOffset()));
      }
    }
    return toAddr(argument.getOffset());
  }  

  // Traces for the Varnode's represented Address and returns the String of its Data.
  String getString(Varnode node) throws Exception {
    Address value = traceVarnodeValue(node);
    Data data = getDataAt(value);
    return (String)data.getValue();
  }
}