For anyone possibly interested in this, the idea I'm basing it off is this;
http://en.wikipedia.org/wiki/ESheep
I strongly recommend trying that program out if you want to see what I mean. Obviously it could be much prettier than that program as we've access to better sprites / sounds. Maybe to make things easier, one could reverse engineer that program so we know basically how it works? Again, I've no real clue about these things, just thoughts.
I also realise that the sprites will be small on anything but a small resolution, but I assume scaling them up wouldn't be so difficult.