Skip to content

Building My Own Siri / Jarvis

January 13, 2012


Most of the magic behind Siri happens remotely.

I want to create my OWN version of Siri…. except I don’t care for having it on my phone. I want my entire house to be talking to me… more like Jarvis (from Ironman).

I believe I have access to all the right resources to create this AI.
It breaks down into three major parts:
1) convert speech to text
2) query database populated with q&a
3) convert text to speech

Speech to Text

Most speech to text engines suck. Siri’s works exceptionally well because the engine isn’t on your phone… it’s remote. I supposed we can hack Siri by running a MITM attack on an iphone and faking the SSL cert and intercepting the apple ID…. OR we can do something much simpler. Google’s Chrome 11 browser includes a voice input function (which isn’t yet part of the HTML5 standard) and can convert your speech into text. This guy discovered that it was happening remotely through an undocumented API call to google. All we have to do is access this same API and we got ourselves a free Speech-to-Text engine!

In case you don’t understand Perl, this is how you use the API:

POST to: https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US

POST params: Content (which should include the contents of a .flac encoding of your voice recorded in mono 16000hz or 8000hz)
Content_Type (which should read “audio/x-flac; rate=16000” or 8000 depending on your voice recording. This should also be mirrored in the Content-Type section of your header.)

Response: json text

I used ffmpeg to convert my audio into the desired format:
ffmpeg -i Memo.m4a -vn -ac 1 -ar 16000 -acodec flac test.flac

So I recorded my voice on my iphone 3gs asking “what day is it today?” and converted it to the appropriate .flac format and posted it to google’s API and this is what I got in response:

{"status":0,"id":"008bd1a95c3c2b04bd754da5e82949f4-1","hypotheses":[{"utterance":"what day is it today","confidence":0.91573924}]}

Sweet.

Database populated with Q&A

This is probably the most difficult part to obtain. To build it from scratch would require tons of data and advanced algorithms to interpret sentences constructed in various ways. I read somewhere that Siri was using Wolfram Alpha’s database….. so…. I checked out Wolfram Alpha and they have an engine that answers your questions. Not only that, they also offer an API service. (If you query less than 2000 times a month, it’s free!). So I signed up for the API service and tested it out. I asked it some simple questions like “What day is it today?” and “Who is the president of the United States?”. It returns all answers in a well-formed XML format.


<?xml version='1.0' encoding='UTF-8'?>
<queryresult success='true'
    error='false'
    numpods='1'
    datatypes='City,DateObject'
    timedout=''
    timing='1.728'
    parsetiming='0.193'
    parsetimedout='false'
    recalculate=''
    id='MSP77719ii856b9090fei40000543b8b9eibb14ida&s=21'
    related='http://www4d.wolframalpha.com/api/v2/relatedQueries.jsp?id=MSP77819ii856b9090fei400001d3h9h126cgaeigc&s=21'
    version='2.1'>
 <pod title='Result'
     scanner='Identity'
     id='Result'
     position='200'
     error='false'
     numsubpods='1'
     primary='true'>
  <subpod title=''
      primary='true'>
   <plaintext>Friday, January 13, 2012</plaintext>
  </subpod>
 </pod>
</queryresult>

Again…. sweet.

Text to Speech

This part is easy… and google makes it even easier with yet another undocumented API! It’s straight-forward. A simple GET request to:

http://translate.google.com/translate_tts?tl=en&q=speech+to+convert
Just replace the q parameter with any sentence and you can hear google’s female robot voice say anything you want.

Voice Input

I can either make my program run over a web browser or as a stand-alone app. Running it over the web browser is cool because I would then be able to run it from just about any machine. Unfortunately, HTML 5 doesn’t have a means of recording voice. My options are a) only use google Chrome, b) make a flash app, c) make a Java applet.

Anywho… no big deal.

Putting It All Together


<?php 
    $stturl = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
    $wolframurl = "http://api.wolframalpha.com/v2/query?appid=[GET+YOUR+OWN+STINKIN+APP+ID]&format=plaintext&podtitle=Result&input=";
    $ttsurl = "http://translate.google.com/translate_tts?tl=en&q=master+cranky,+";

// Google Speech to Text

    $filename = "./test1.flac";
    $upload = file_get_contents($filename);
    $data = array(
        "Content_Type"  =>  "audio/x-flac; rate=16000",
        "Content"       =>  $upload,
    );
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $stturl);
    curl_setopt( $ch, CURLOPT_HTTPHEADER, array("Content-Type: audio/x-flac; rate=16000"));
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
    ob_start();
    curl_exec($ch);
    curl_close($ch);
    $contents = ob_get_contents();
    ob_end_clean();
    $textarray = (json_decode($contents,true));
    $text = $textarray['hypotheses']['0']['utterance'];
    
// Wolfram Alpha API

    $wolframurl .= urlencode($text);
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $wolframurl);
    ob_start();
    curl_exec($ch);
    curl_close($ch);
    $contents = ob_get_contents();
    ob_end_clean();
    $obj = new SimpleXMLElement($contents);
    $answer = $obj->pod->subpod->plaintext;

// Google Text to Speech

    $ttsurl .= urlencode($answer);
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $ttsurl);
    ob_start();
    curl_exec($ch);
    curl_close($ch);
    $contents = ob_get_contents();
    ob_end_clean();
    header('Content-Type: audio/mpeg');
    header('Cache-Control: no-cache');
    print $contents;
?>

It responds with this answer. Good girl.
It’s still missing the voice input portion of the code. Currently, it just accepts a .flac file. I wrote 3 chunks of code that I put together as one pipeline of an AI process. The advantage of this over Siri is that I can intervene at anytime. I can have it listen for particular questions such as “who is your master?” and respond appropriately…. but more importantly, I can have it listen for “Turn on my lights” or “turn on the TV” or “open the garage door” or “turn to channel 618”. Certain questions will have my bot send a signal to the appropriate Arduino controlled light switch or garage switch or IR blaster and respond with a “yes, master”. I’ll post videos when it’s done.

Here is a video of the prototype in action.

Updated to give you a link to a working demo. This version requires you to use the Chrome browser (thanks to Shiv Kokroo for generously providing hosting / wolfram app ID):

Working Demo

Click on the little microphone and try asking her a question like “how many legs does a spider have?” or “what is 15 + 11?” or “turn off the lights”. 🙂

Update: There is a follow-up to this post here.

Source codes can be found on github.

From → Hacks

75 Comments
  1. Buddy , you are damn cool ! I would like to collaborate on this !

    • Hey Shiv! Dude, that would be awesome. Just to keep you updated, I’m looking into the X10 devices so I can make utilize Jarvis to automate the home. I have a arduino + ethernet shield programmed to accept commands from jarvis to trigger responses for TV, lights, garage, etc.

  2. The demo is awesome .

  3. chefwear permalink

    Any chance of the Demo being put back up?

    • Hey chefwear, I wasn’t hosting the demo and didn’t realize it was down. I’ll try to get a working prototype back up.

  4. Yoram Meijaard permalink

    Looks awesome. Demo what nice. I’m very interested in the result.

  5. Hi Cranklin! What’s the status of this? Did you get the arduino working?

    • Actually, I did. I didn’t actually connect the arduino to the appliances/lights/garage/doors yet, but I have an arduino with an ethernet shield that acts as a miniature intranet web server and waits for instructions via GET requests. When I speak to Jarvis, she can GET requests to the arduino and give it instructions to turn on/off lights with instructions like “http://[internal IP]:[port]/?dev=tv&cmd=on” to turn the tv on.
      I’m looking into other protocols such as X10, xbee, etc before I finalize the project.
      I’ll post source codes for the arduino webserver and an updated jarvis/siri in a future post.

      • Awesome, I’ve been really interested in home automation through this method, but I was going to use AppleScript, – automate computer, home, and hopefully I’ll figure out how to use the APIs with it! I would love to look at your arduino code too!

      • Daniel permalink

        I had the same for automating the home. But, first I am remaking my computer. I was going to try and port Skyvie over to my computer.

  6. hey man this is so cool , i always kinda wanted to do this can i put this on my tech blog ? cheers

  7. Could you make a 100% custom server and make it sound like zazu from the lion king ? 😛
    Or even better allow “pst!” to toggle siri and have hyper sensitive sound for intimate conversations ?

    • lol David. That would be comical. I can’t get Zazu’s voice, but I did notice that when the google TTS api is triggered from a different country, the accent is different.

  8. Stefanoxr2 permalink

    Is this project dead? Id like to pick up where you left off.

    • Hi Stefanoxr. It’s not dead, but I just haven’t had time to work on it lately because of my job. Feel free to develop it further. Everything is in github though I apologize for the lack of organization. I also added a trueknowledge API version.

  9. MichaelDealwa permalink

    I too am building my own Jarvis and am interested in using the wolfram software to do it. Are you ok if I use some of your software as a basis like stefanoxr2? How can I access it on github? Is there a link I’m not seeing?

    • Hi Michael. How is your software coming along? You can find my source code on http://github.com/cranklin/Jarvis

      • MichaelDealwa permalink

        Slow man. Got to admit I’m new a complete newbie to electronics and programming. I’m quick learner and eager to tackle this project, even if it’s way beyond my current abilities. I’m a fast learner. Any general tips or resources about what I should learn first? For instance the type of code you’re using and whatnot? I’m just looking for some open resource to learn.

      • Hey Michael, there’s nothing wrong with that. I’m pretty sure you’ll grasp everything you need soon enough. If you don’t mind me asking, can you tell me what technologies and/or programming languages you are currently comfortable with?

  10. MichaelDealwa permalink

    Well, when I said new to languages I meant, grandmotherish. As in I know how to open my email and search the web. Changing the background picture of my laptop would have been difficult for me two weeks ago. Like I said though, I have a steep learning curve and I’m out of college for the summer, so I’m already comfortable with Java, XML, and some Basic, although I’m having a tough time finding a good place to learn basic. What language are you using for your programming if you don’t mind me asking?

    • Hey Michael, I’ll use whatever language is the best fit. For example, programming the arduino microcontroller requires C (not true C as it does use objects… but similar enough)… I chose PHP for the backend of the web interface, but I could have easily used Python or another language of choice. Don’t let the language be your focus. If you’re a good programmer, you’ll be able to learn a new language on the fly.
      If I was going to make a recommendation to a new programmer, I’d recommend python. It’s widely supported, it has many different applications, it’s fairly easy to learn, and it’s just an over great language.

      • MichaelDealwa permalink

        Awesome! Python and C are my next focus’ then. I’ll let you know how it’s coming along in a while. 🙂 Thanks for all the help! 🙂

  11. Time lag is 3 to 4 seconds. How one can make it faster?

    • Rahul, you can disable one of the 2 AI engines. I reckon the double query will slow things down significantly.

  12. Tom permalink

    I haven’t learned any computer languages yet and am wondering how.could a complete newbie like me figure out how to do all this stuff, thanks

    • Hi Tom, I am sorry about the late response. I have been so busy with work.
      If you haven’t learned any computer languages yet, this may all seem very overwhelming. I recommend getting your feet wet first. There are tons of online resources. codecademy.com offers some great classes that will help get you on your feet. I recommend it.

  13. harmakhis permalink

    thanks u so much dude !! really

  14. Hey I came across your jarvis project while I was looking into doing something similar. Would you possibly be able to email me I have some questions I would like to ask. Thanks in advance

  15. Matthew permalink

    How did you get the male voice vs the google female voice?

    Thanks

    • Actually, that’s up to google. I noticed depending on your region, the google voice changes.

      • Matthew permalink

        Thanks for the fast response. So you just set your region to what?

      • I left my region default.

    • Matthew permalink

      Which is? I’m in the USA and its a female.

      • I’m in the USA as well and it’s a female. Where did you hear the male voice? The link that I provided is actually an Indian server. That might be why. lol

  16. Aleesha permalink

    Syn virtual assistant is coming this april i saw its video on youtube something named like madonna virtual assistant its free and made for developers. if it can be extend i will definitely be using it because they say its free

  17. Austin Schick permalink

    I had a bunch of trouble attempting to use the google api until somebody suggested to me that I try http instead of https. I don’t know why https was failing for me, but just in case somebody else is having problems, here’s something to try.

  18. Hello, do you run the code on an wamp server or similar? because php is serverside, so i don’t know how you host it, for the arduino to connect, doesn’t they both have to be on the same network? Or do you forward the requests to the arduino from a website? Victor

  19. Hey Cranklin,
    I’m looking into getting my feet back into programming, i’ve had basic C++ experience so I have a very general idea of whats going on in your program. I’m interested in replicating your program here but I want to learn what is happening at the same time and not just copy the code line for line. Is there any way you can add a few comments to the file to further explain the implementation of the APIs in the code?
    Regards

    • Hi Tiko, sorry for the late reply. I’ve been crazy busy. Yes. Actually, if you can wait, I’m re-releasing Jarvis with a lot of enhancements and it will be easier to follow.

  20. Andy Rod permalink

    Hi i have been wondering about your JARVIS and thought that this is really cool, but i do not know the programs you used so if you can please tell me them i will be most pleased to make my own JARVIS! P.S. Ive been looking for this for a while and this seems perfect!

  21. Quentin P. permalink

    I’m really new to coding. I’m kinda confused as to what you’re coding this on, and what language, and if you’d be interested in kinda making a more step by step kind of post.

  22. Tyrone permalink

    Hi, I have been following your javrvis project, and it’s ridiculously cool! I would like to recreate your program. I know java and I wanted to ask if this could be recreated using java or would I have to pick up some python to recreate the project? Look forward to hearing from you. Awesome Project!

    • you can use java (or other language). Just pay attention to the requests being made to google as well as the AI API. With just a little bit of work, you can easily port this to Java.

  23. savdont permalink

    Hi, currently I know some arduino programming and java. Is it possible to create a application which serves as the main control panel for the jarvis project and then based on voice commands sent to the computer by voice, the computer will be able to respond to the voice command with the correct response?

    • savdont permalink

      The application will most likely be made for PC

    • I’m not exactly sure what you’re asking, but yes. If you look at part 2 of this blog post, I think I’m doing what you’re asking about. I may be mistaken.

      • Savarn Dontamsetti permalink

        Okay will look into it thanks! Do you know if this whole project can be made using a beagle bone board?

      • I haven’t tried tampering with a beagle bone board, but I’m pretty certain it can.

  24. sasuke permalink

    Hello cranklin
    i am a newbie for programming. You have a code posted there how can i get it running?
    the PHP code you have given in the github files..

  25. how can we run the above code?? by using which software ?? pls tell me ..anybody

  26. peterMan permalink

    Hi I’m a UI design Developer so I’m always looking for some cool projects and i think this is very cool and would love to help you develop this to make it a desktop app that people can just down load and just have it running every where so like jarvis even if you are at work from your phone you can have your AI complete task at home, stuff like that.

  27. dheeraj permalink

    Hey cranklin on what language is your project based? Please tell me.

  28. prasad permalink

    hey brother i pretty much like your work and looking forward for it, but as you say about the Google’s API..
    As i was checking related to JARVIS and came across this website. I am week in HTML so can you please check this site they are doing same like google API. and please let me know whether they have used google’s API

  29. Good day! I just would like to offer you a big thumbs up for your
    great info you have got here on this post. I’ll be coming back to your site for more soon.

  30. It’s perfect time to make some plans for the future
    and it’s time to be happy. I have read this post and if I
    could I wish to suggest you some interesting things or advice.
    Maybe you can write next articles referring
    to this article. I wish to read more things about it!

  31. Alan permalink

    Hi, your site is amazing! Thank to you i have finished my version of the programm. Now the url is not working becouse it was released a new version of the api. I solve the problem reading here -> https://github.com/gillesdemey/google-speech-v2
    For windows user: it not necessary to convert the audio to flac! you can use .wav file!

  32. I’ve learn some good stuf here. Definitely price bookmarking for
    revisiting. I surprise how muh attemt you set to create any such great informative website.

  33. It’s fantastic that you are getting thoughts from this piece of writing as well as from
    our dialogue made at this place.

  34. Solitare permalink

    How do you do the same in Java?

  35. NANDAKUMARAN permalink

    I LIKE THAT

  36. Adam W Sullivan permalink

    What if I dont really want it to respond to me with a voice, but with text? But also understand what im saying. So, I speak in the microphone, and it responds on the screen with text

  37. There’s an easy and best tutorial on Youtube to get started completely on all the necessary concepts to Build An Advanced App Like SIRI :

  38. The Chilean winger and Mesut Ozil are in talks with the Gunners over lucrative contract extensions.

Trackbacks & Pingbacks

  1. Hackerspaces and General DIY « chefwear
  2. Building My Own Siri / Jarvis part 2 « cranklin.com
  3. Hacking Is So Easy, Even a Computer Can Do It « cranklin.com
  4. Artificial Intelligence Applied to Your Drone | cranklin.com

Leave a reply to Quentin P. Cancel reply