Building My Own Siri / Jarvis

January 13, 2012

Most of the magic behind Siri happens remotely.

I want to create my OWN version of Siri…. except I don’t care for having it on my phone. I want my entire house to be talking to me… more like Jarvis (from Ironman).

I believe I have access to all the right resources to create this AI.
It breaks down into three major parts:
1) convert speech to text
2) query database populated with q&a
3) convert text to speech

Speech to Text

Most speech to text engines suck. Siri’s works exceptionally well because the engine isn’t on your phone… it’s remote. I supposed we can hack Siri by running a MITM attack on an iphone and faking the SSL cert and intercepting the apple ID…. OR we can do something much simpler. Google’s Chrome 11 browser includes a voice input function (which isn’t yet part of the HTML5 standard) and can convert your speech into text. This guy discovered that it was happening remotely through an undocumented API call to google. All we have to do is access this same API and we got ourselves a free Speech-to-Text engine!

In case you don’t understand Perl, this is how you use the API:

POST to: https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US

POST params: Content (which should include the contents of a .flac encoding of your voice recorded in mono 16000hz or 8000hz)
Content_Type (which should read “audio/x-flac; rate=16000” or 8000 depending on your voice recording. This should also be mirrored in the Content-Type section of your header.)

Response: json text

I used ffmpeg to convert my audio into the desired format:
ffmpeg -i Memo.m4a -vn -ac 1 -ar 16000 -acodec flac test.flac

So I recorded my voice on my iphone 3gs asking “what day is it today?” and converted it to the appropriate .flac format and posted it to google’s API and this is what I got in response:

{"status":0,"id":"008bd1a95c3c2b04bd754da5e82949f4-1","hypotheses":[{"utterance":"what day is it today","confidence":0.91573924}]}

Sweet.

Database populated with Q&A

This is probably the most difficult part to obtain. To build it from scratch would require tons of data and advanced algorithms to interpret sentences constructed in various ways. I read somewhere that Siri was using Wolfram Alpha’s database….. so…. I checked out Wolfram Alpha and they have an engine that answers your questions. Not only that, they also offer an API service. (If you query less than 2000 times a month, it’s free!). So I signed up for the API service and tested it out. I asked it some simple questions like “What day is it today?” and “Who is the president of the United States?”. It returns all answers in a well-formed XML format.


<?xml version='1.0' encoding='UTF-8'?>
<queryresult success='true'
    error='false'
    numpods='1'
    datatypes='City,DateObject'
    timedout=''
    timing='1.728'
    parsetiming='0.193'
    parsetimedout='false'
    recalculate=''
    id='MSP77719ii856b9090fei40000543b8b9eibb14ida&s=21'
    related='http://www4d.wolframalpha.com/api/v2/relatedQueries.jsp?id=MSP77819ii856b9090fei400001d3h9h126cgaeigc&s=21'
    version='2.1'>
 <pod title='Result'
     scanner='Identity'
     id='Result'
     position='200'
     error='false'
     numsubpods='1'
     primary='true'>
  <subpod title=''
      primary='true'>
   <plaintext>Friday, January 13, 2012</plaintext>
  </subpod>
 </pod>
</queryresult>

Again…. sweet.

Text to Speech

This part is easy… and google makes it even easier with yet another undocumented API! It’s straight-forward. A simple GET request to:

http://translate.google.com/translate_tts?tl=en&q=speech+to+convert
Just replace the q parameter with any sentence and you can hear google’s female robot voice say anything you want.

Voice Input

I can either make my program run over a web browser or as a stand-alone app. Running it over the web browser is cool because I would then be able to run it from just about any machine. Unfortunately, HTML 5 doesn’t have a means of recording voice. My options are a) only use google Chrome, b) make a flash app, c) make a Java applet.

Anywho… no big deal.

Putting It All Together


<?php 
    $stturl = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
    $wolframurl = "http://api.wolframalpha.com/v2/query?appid=[GET+YOUR+OWN+STINKIN+APP+ID]&format=plaintext&podtitle=Result&input=";
    $ttsurl = "http://translate.google.com/translate_tts?tl=en&q=master+cranky,+";

// Google Speech to Text

    $filename = "./test1.flac";
    $upload = file_get_contents($filename);
    $data = array(
        "Content_Type"  =>  "audio/x-flac; rate=16000",
        "Content"       =>  $upload,
    );
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $stturl);
    curl_setopt( $ch, CURLOPT_HTTPHEADER, array("Content-Type: audio/x-flac; rate=16000"));
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
    ob_start();
    curl_exec($ch);
    curl_close($ch);
    $contents = ob_get_contents();
    ob_end_clean();
    $textarray = (json_decode($contents,true));
    $text = $textarray['hypotheses']['0']['utterance'];
    
// Wolfram Alpha API

    $wolframurl .= urlencode($text);
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $wolframurl);
    ob_start();
    curl_exec($ch);
    curl_close($ch);
    $contents = ob_get_contents();
    ob_end_clean();
    $obj = new SimpleXMLElement($contents);
    $answer = $obj->pod->subpod->plaintext;

// Google Text to Speech

    $ttsurl .= urlencode($answer);
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $ttsurl);
    ob_start();
    curl_exec($ch);
    curl_close($ch);
    $contents = ob_get_contents();
    ob_end_clean();
    header('Content-Type: audio/mpeg');
    header('Cache-Control: no-cache');
    print $contents;
?>

It responds with this answer. Good girl.
It’s still missing the voice input portion of the code. Currently, it just accepts a .flac file. I wrote 3 chunks of code that I put together as one pipeline of an AI process. The advantage of this over Siri is that I can intervene at anytime. I can have it listen for particular questions such as “who is your master?” and respond appropriately…. but more importantly, I can have it listen for “Turn on my lights” or “turn on the TV” or “open the garage door” or “turn to channel 618”. Certain questions will have my bot send a signal to the appropriate Arduino controlled light switch or garage switch or IR blaster and respond with a “yes, master”. I’ll post videos when it’s done.

Here is a video of the prototype in action.

Updated to give you a link to a working demo. This version requires you to use the Chrome browser (thanks to Shiv Kokroo for generously providing hosting / wolfram app ID):

Working Demo

Click on the little microphone and try asking her a question like “how many legs does a spider have?” or “what is 15 + 11?” or “turn off the lights”. 🙂

Update: There is a follow-up to this post here.

Source codes can be found on github.

From → Hacks

75 Comments

Shiv Kokroo permalink

Buddy , you are damn cool ! I would like to collaborate on this !

Reply
- cranklin permalink
  
  Hey Shiv! Dude, that would be awesome. Just to keep you updated, I’m looking into the X10 devices so I can make utilize Jarvis to automate the home. I have a arduino + ethernet shield programmed to accept commands from jarvis to trigger responses for TV, lights, garage, etc.
  
  Reply
  - valus permalink
    
    can you possibly integrate wolfram alpha queries into speech recognition through perl? if so how? can you plz help me. I am absolutely at the beginners level at programming,but I’m willing to learn. I’m absolutely interested in this project.
  - cranklin permalink
    
    http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/
    On that page you can see how he used perl to access the google speech api.
Shiv Kokroo permalink

The demo is awesome .

Reply
chefwear permalink

Any chance of the Demo being put back up?

Reply
- cranklin permalink
  
  Hey chefwear, I wasn’t hosting the demo and didn’t realize it was down. I’ll try to get a working prototype back up.
  
  Reply
Yoram Meijaard permalink

Looks awesome. Demo what nice. I’m very interested in the result.

Reply
reddog92396 permalink

Hi Cranklin! What’s the status of this? Did you get the arduino working?

Reply
- cranklin permalink
  
  Actually, I did. I didn’t actually connect the arduino to the appliances/lights/garage/doors yet, but I have an arduino with an ethernet shield that acts as a miniature intranet web server and waits for instructions via GET requests. When I speak to Jarvis, she can GET requests to the arduino and give it instructions to turn on/off lights with instructions like “http://[internal IP]:[port]/?dev=tv&cmd=on” to turn the tv on.
  I’m looking into other protocols such as X10, xbee, etc before I finalize the project.
  I’ll post source codes for the arduino webserver and an updated jarvis/siri in a future post.
  
  Reply
  - reddog92396 permalink
    
    Awesome, I’ve been really interested in home automation through this method, but I was going to use AppleScript, – automate computer, home, and hopefully I’ll figure out how to use the APIs with it! I would love to look at your arduino code too!
  - Daniel permalink
    
    I had the same for automating the home. But, first I am remaking my computer. I was going to try and port Skyvie over to my computer.
subin (@subinznz) permalink

hey man this is so cool , i always kinda wanted to do this can i put this on my tech blog ? cheers

Reply
- cranklin permalink
  
  Hey Subin, absolutely.
  
  Reply
David Xavier permalink

Could you make a 100% custom server and make it sound like zazu from the lion king ? 😛
Or even better allow “pst!” to toggle siri and have hyper sensitive sound for intimate conversations ?

Reply
- cranklin permalink
  
  lol David. That would be comical. I can’t get Zazu’s voice, but I did notice that when the google TTS api is triggered from a different country, the accent is different.
  
  Reply
Stefanoxr2 permalink

Is this project dead? Id like to pick up where you left off.

Reply
- cranklin permalink
  
  Hi Stefanoxr. It’s not dead, but I just haven’t had time to work on it lately because of my job. Feel free to develop it further. Everything is in github though I apologize for the lack of organization. I also added a trueknowledge API version.
  
  Reply
MichaelDealwa permalink

I too am building my own Jarvis and am interested in using the wolfram software to do it. Are you ok if I use some of your software as a basis like stefanoxr2? How can I access it on github? Is there a link I’m not seeing?

Reply
- cranklin permalink
  
  Hi Michael. How is your software coming along? You can find my source code on http://github.com/cranklin/Jarvis
  
  Reply
  - MichaelDealwa permalink
    
    Slow man. Got to admit I’m new a complete newbie to electronics and programming. I’m quick learner and eager to tackle this project, even if it’s way beyond my current abilities. I’m a fast learner. Any general tips or resources about what I should learn first? For instance the type of code you’re using and whatnot? I’m just looking for some open resource to learn.
  - cranklin permalink
    
    Hey Michael, there’s nothing wrong with that. I’m pretty sure you’ll grasp everything you need soon enough. If you don’t mind me asking, can you tell me what technologies and/or programming languages you are currently comfortable with?
MichaelDealwa permalink

Well, when I said new to languages I meant, grandmotherish. As in I know how to open my email and search the web. Changing the background picture of my laptop would have been difficult for me two weeks ago. Like I said though, I have a steep learning curve and I’m out of college for the summer, so I’m already comfortable with Java, XML, and some Basic, although I’m having a tough time finding a good place to learn basic. What language are you using for your programming if you don’t mind me asking?

Reply
- cranklin permalink
  
  Hey Michael, I’ll use whatever language is the best fit. For example, programming the arduino microcontroller requires C (not true C as it does use objects… but similar enough)… I chose PHP for the backend of the web interface, but I could have easily used Python or another language of choice. Don’t let the language be your focus. If you’re a good programmer, you’ll be able to learn a new language on the fly.
  If I was going to make a recommendation to a new programmer, I’d recommend python. It’s widely supported, it has many different applications, it’s fairly easy to learn, and it’s just an over great language.
  
  Reply
  - MichaelDealwa permalink
    
    Awesome! Python and C are my next focus’ then. I’ll let you know how it’s coming along in a while. 🙂 Thanks for all the help! 🙂
rahul2047ahul permalink

Time lag is 3 to 4 seconds. How one can make it faster?

Reply
- cranklin permalink
  
  Rahul, you can disable one of the 2 AI engines. I reckon the double query will slow things down significantly.
  
  Reply
Tom permalink

I haven’t learned any computer languages yet and am wondering how.could a complete newbie like me figure out how to do all this stuff, thanks

Reply
- cranklin permalink
  
  Hi Tom, I am sorry about the late response. I have been so busy with work.
  If you haven’t learned any computer languages yet, this may all seem very overwhelming. I recommend getting your feet wet first. There are tons of online resources. codecademy.com offers some great classes that will help get you on your feet. I recommend it.
  
  Reply
harmakhis permalink

thanks u so much dude !! really

Reply
- cranklin permalink
  
  Thank you for reading!
  
  Reply
Metin permalink

Hey I came across your jarvis project while I was looking into doing something similar. Would you possibly be able to email me I have some questions I would like to ask. Thanks in advance

Reply
- cranklin permalink
  
  Hi Metin, sure. What kind of questions did you have?
  
  Reply
Matthew permalink

How did you get the male voice vs the google female voice?

Thanks

Reply
- cranklin permalink
  
  Actually, that’s up to google. I noticed depending on your region, the google voice changes.
  
  Reply
  - Matthew permalink
    
    Thanks for the fast response. So you just set your region to what?
  - cranklin permalink
    
    I left my region default.
- Matthew permalink
  
  Which is? I’m in the USA and its a female.
  
  Reply
  - cranklin permalink
    
    I’m in the USA as well and it’s a female. Where did you hear the male voice? The link that I provided is actually an Indian server. That might be why. lol
Aleesha permalink

Syn virtual assistant is coming this april i saw its video on youtube something named like madonna virtual assistant its free and made for developers. if it can be extend i will definitely be using it because they say its free

Reply
Austin Schick permalink

I had a bunch of trouble attempting to use the google api until somebody suggested to me that I try http instead of https. I don’t know why https was failing for me, but just in case somebody else is having problems, here’s something to try.

Reply
victorachton permalink

Hello, do you run the code on an wamp server or similar? because php is serverside, so i don’t know how you host it, for the arduino to connect, doesn’t they both have to be on the same network? Or do you forward the requests to the arduino from a website? Victor

Reply
Tiko Nelson permalink

Hey Cranklin,
I’m looking into getting my feet back into programming, i’ve had basic C++ experience so I have a very general idea of whats going on in your program. I’m interested in replicating your program here but I want to learn what is happening at the same time and not just copy the code line for line. Is there any way you can add a few comments to the file to further explain the implementation of the APIs in the code?
Regards

Reply
- cranklin permalink
  
  Hi Tiko, sorry for the late reply. I’ve been crazy busy. Yes. Actually, if you can wait, I’m re-releasing Jarvis with a lot of enhancements and it will be easier to follow.
  
  Reply
Andy Rod permalink

Hi i have been wondering about your JARVIS and thought that this is really cool, but i do not know the programs you used so if you can please tell me them i will be most pleased to make my own JARVIS! P.S. Ive been looking for this for a while and this seems perfect!

Reply
Quentin P. permalink

I’m really new to coding. I’m kinda confused as to what you’re coding this on, and what language, and if you’d be interested in kinda making a more step by step kind of post.

Reply
Tyrone permalink

Hi, I have been following your javrvis project, and it’s ridiculously cool! I would like to recreate your program. I know java and I wanted to ask if this could be recreated using java or would I have to pick up some python to recreate the project? Look forward to hearing from you. Awesome Project!

Reply
- cranklin permalink
  
  you can use java (or other language). Just pay attention to the requests being made to google as well as the AI API. With just a little bit of work, you can easily port this to Java.
  
  Reply
savdont permalink

Hi, currently I know some arduino programming and java. Is it possible to create a application which serves as the main control panel for the jarvis project and then based on voice commands sent to the computer by voice, the computer will be able to respond to the voice command with the correct response?

Reply
- savdont permalink
  
  The application will most likely be made for PC
  
  Reply
- cranklin permalink
  
  I’m not exactly sure what you’re asking, but yes. If you look at part 2 of this blog post, I think I’m doing what you’re asking about. I may be mistaken.
  
  Reply
  - Savarn Dontamsetti permalink
    
    Okay will look into it thanks! Do you know if this whole project can be made using a beagle bone board?
  - cranklin permalink
    
    I haven’t tried tampering with a beagle bone board, but I’m pretty certain it can.
CGone permalink

http://www.codeproject.com/Articles/579471/How-to-Write-Your-Own-Siri-Application-Mobile-Assi

Reply
sasuke permalink

Hello cranklin
i am a newbie for programming. You have a code posted there how can i get it running?
the PHP code you have given in the github files..

Reply
Aadhi permalink

how can we run the above code?? by using which software ?? pls tell me ..anybody

Reply
peterMan permalink

Hi I’m a UI design Developer so I’m always looking for some cool projects and i think this is very cool and would love to help you develop this to make it a desktop app that people can just down load and just have it running every where so like jarvis even if you are at work from your phone you can have your AI complete task at home, stuff like that.

Reply
dheeraj permalink

Hey cranklin on what language is your project based? Please tell me.

Reply
prasad permalink

hey brother i pretty much like your work and looking forward for it, but as you say about the Google’s API..
As i was checking related to JARVIS and came across this website. I am week in HTML so can you please check this site they are doing same like google API. and please let me know whether they have used google’s API

Reply
- prasad permalink
  
  i am very very sorry about it i din’t mentioned the link over hear the link is http://jarone2.jarviscorp.com/newdemo.html
  pleas help me to find the working way of the website
  
  Reply
remote java developer permalink

Good day! I just would like to offer you a big thumbs up for your
great info you have got here on this post. I’ll be coming back to your site for more soon.

Reply
google glass development permalink

It’s perfect time to make some plans for the future
and it’s time to be happy. I have read this post and if I
could I wish to suggest you some interesting things or advice.
Maybe you can write next articles referring
to this article. I wish to read more things about it!

Reply
Alan permalink

Hi, your site is amazing! Thank to you i have finished my version of the programm. Now the url is not working becouse it was released a new version of the api. I solve the problem reading here -> https://github.com/gillesdemey/google-speech-v2
For windows user: it not necessary to convert the audio to flac! you can use .wav file!

Reply
Milla permalink

I’ve learn some good stuf here. Definitely price bookmarking for
revisiting. I surprise how muh attemt you set to create any such great informative website.

Reply
Round Area Rugs permalink

It’s fantastic that you are getting thoughts from this piece of writing as well as from
our dialogue made at this place.

Reply
Solitare permalink

How do you do the same in Java?

Reply
NANDAKUMARAN permalink

I LIKE THAT

Reply
Adam W Sullivan permalink

What if I dont really want it to respond to me with a voice, but with text? But also understand what im saying. So, I speak in the microphone, and it responds on the screen with text

Reply
- Adam W Sullivan permalink
  
  and this is Python right?
  
  Reply
vasurobo permalink

There’s an easy and best tutorial on Youtube to get started completely on all the necessary concepts to Build An Advanced App Like SIRI :

Reply
ct148.aspx permalink

The Chilean winger and Mesut Ozil are in talks with the Gunners over lucrative contract extensions.

Reply