Hackers Can Take Control of Siri and Alexa By Whispering To Them in Frequencies Humans Can't Hear (fastcodesign.com) 116
Chinese researchers have discovered a vulnerability in voice assistants from Apple, Google, Amazon, Microsoft, Samsung, and Huawei. It affects every iPhone and Macbook running Siri, any Galaxy phone, any PC running Windows 10, and even Amazon's Alexa assistant. From a report: Using a technique called the DolphinAttack, a team from Zhejiang University translated typical vocal commands into ultrasonic frequencies that are too high for the human ear to hear, but perfectly decipherable by the microphones and software powering our always-on voice assistants. This relatively simple translation process lets them take control of gadgets with just a few words uttered in frequencies none of us can hear. The researchers didn't just activate basic commands like "Hey Siri" or "Okay Google," though. They could also tell an iPhone to "call 1234567890" or tell an iPad to FaceTime the number. They could force a Macbook or a Nexus 7 to open a malicious website. They could order an Amazon Echo to "open the backdoor." Even an Audi Q3 could have its navigation system redirected to a new location. "Inaudible voice commands question the common design assumption that adversaries may at most try to manipulate a [voice assistant] vocally and can be detected by an alert user," the research team writes in a paper just accepted to the ACM Conference on Computer and Communications Security.
Hahahahahahah (Score:1, Insightful)
"our always-on voice assistants" -- the only thing that's always on is my refrigerator. Siri likes it when I press her button anyway. It would be interesting to do some electronic shoulder surfing at the airport though ... heh Band pass filter coming ASAP!
Re: (Score:1)
Re: (Score:3, Insightful)
Exactly. If someone is exploiting this in my house, then it means they already broke in and have complete physical access to my house, screwing around with the Echo and maybe making fradulent Amazon orders or whatever would be the least of my concerns.
Re: Needs physical access (Score:1)
Can I inject an attack into OTA frequencies received by radios and tv.
Nope. (Score:2)
Can I inject an attack into OTA frequencies received by radios and tv.
Analog and Digital media respectively can't carry such high frequencies
(the spectrum carriable by FM radio is narrower than the human ear. So it's the opposite: you can hear noises to which the radio is deaf)
or compress them away
(DAB+ radio and the various DVB- TV use AACplus codecs. This only encodes mid-range frequencies and re-generates high-frequencies by replicating the spectrum. It makes totally sense for compressing music - (store the base freq and the first couple of harmonics of an instrument, the
Re: (Score:2)
People claiming it can not be done should not interrupt people already doing it [arstechnica.com]. Or something...
Bandwidth, again. (Score:2)
People claiming it can not be done should not interrupt people already doing it [arstechnica.com].
(BTW: Spotify (or was it another networked music player ?) is also doing it to help identify the various device that are within reach. And as it's the app it self generating the code, it's not constrained by the audio compression limits - Vorbis in their case.
They can easily emit beep codes at 24kHz which you would definitely NOT hear, but which could be emitted and picked up by current mic and speaker technology available to the platforms on which the various instances of music player are running and would
Re: (Score:2)
Point was, the existing broadcast tech can carry the frequencies at the edge or even slightly beyond human hearing.
Can it be exploited? Of course, it can...
What if you left it charging, while you take a shower? "Honey, your phone was saying something, not sure what..."
What if it is not a purchase, but simply opening a URL — to identify you and, maybe, offer y
Speech bandwidth (Score:2)
Point was, the existing broadcast tech can carry the frequencies at the edge or even slightly beyond human hearing.
For digital TV and Radio, only barely so ("at the edge")(*). Not even beyond, and not much space up-there. (Though it might be enough to do what is a glorified form of Morse code(*) - as done by the advertiser example you site).
Still : they cannot carry the technology mentioned in the article (DolphinAttack relies on speech being pitched up all the way > 24kHz - way beyond what is carried by TV and Radio. My "infra-red over TV" metaphor still applies). /.ers panicking "DolphinAttack could be use
So to all
Re:Needs physical access (Score:5, Interesting)
You're not thinking very creatively, since I was able to think of a variety of attacks that could use this without having physical access to the interior of your home.
For instance, they could have just dropped a small device into your pocket that every few minutes emits an inaudible command to open the garage door. You, yourself would be the vector through which the attacker could attack your always-on devices in your home. In fact, it could even be something you're aware of, like a thumb drive you were given that secretly has a tiny speaker built in or that is setup to autorun a sound file with the commands when plugged into a computer.
Alternatively, a person who is known to you but who you don't realize is malicious could use this to gain physical access. Maybe you're okay taking a FaceTime call from them, but then they transmit the inaudible signal over the call, which your iDevice faithfully reproduces, resulting in Alexa, Siri, or whatever else opening your garage door. Or maybe someone standing outside at your smart doorbell uses it when you ask what they want via the app, resulting in your phone or tablet reproducing the sounds within earshot of a device that will respond to them.
A third possibility is that they could use your always-on phone to engage in an attack against your home even while you're not at home. For instance, an attacker passing you in the street could activate the commands on a device in your hand or pocket via "OK Google" or "Hey Siri" to open your garage door for a crony of theirs. For that matter, anyone who can get within listening distance of your phone can use this attack on it, all without ever having access to the devices within your home.
Er, no (Score:2)
Convoluted technical means to get your internet devices to "open the back door" are not the go-to tactic for any burglar. Nor will they be.
The go-to tactic is to kick your door really hard or break a window, then retreat. This is a basic test for a real security system - with window switches, motion sensors, a battery, a failsafe, and a separate cellular connection. Getting Alexa or whatever to "open the back door" would only act as another test for this, and actually be _harder_.
If the cops don't show u
Distance. (Score:2)
Convoluted technical means to get your internet devices to "open the back door" are not the go-to tactic for any burglar. Nor will they be.
The go-to tactic is to kick your door really hard or break a window, then retreat.
The problem is that "breaking the window" is a very noisy method that only works when the victim is away from home.
Managing to have the door opened to you - by e.g.: jamming the FM radio constantly blaring music as backgound - could work even when the victim is in another room of the house (e.g.: taking a bath).
No. (Score:2)
You're not getting it. The _whole_point_ is to only enter the house when there is no one home. Contrary to what Hollywood feeds you, burglars have zero interest in dealing with hostages or committing murder. They want easily shiftable goods, not an armed confrontation and a bloody mess followed by huge police scrutiny.
Different environment ? (Score:2)
You're not getting it. The _whole_point_ is to only enter the house when there is no one home. Contrary to what Hollywood feeds you,
I'm not basing my scenario on what crap is currently running on the TV.
I'm basing it on what has occasionally happened here around (but very likely, the burglars here around aren't the same as the one you have on your side of the Atlantic pond).
Something that is often seen :
An old couple go home after buying groceries. Grandma causally leaves her hand bag by the entrance door. (with her purse inside - containing money and credit cards, and this being an old couple, there's more cash than credit cards)
After
Re: (Score:2)
An attacker could place a speaker against a window pane and tell the device inside to unlock the doors. They could call the answer phone and talk to it that way.
Malicious ads already produce high frequency sounds that spyware on phones can track, so presumably they could just emit speech at those frequencies instead.
Re: (Score:2)
Possible, but less likely. The performance characteristics of most modern speakesr on most home quality devices would probably not give you much of a ceiling to transmit. IIIRC, most headphones for examples have frequency response in the 100Hz-30kHz range.
It strikes me that that one way to get around this is to be a bit more careful about signal processing on training - low pass filters filtering might help, or doing some dynamic range compression mi
Re: (Score:2)
Limitation: Codecs for dogs (Score:2)
Alternatively, a person who is known to you but who you don't realize is malicious could use this to gain physical access. Maybe you're okay taking a FaceTime call from them, but then they transmit the inaudible signal over the call, which your iDevice faithfully reproduces, resulting in Alexa, Siri, or whatever else opening your garage door. Or maybe someone standing outside at your smart doorbell uses it when you ask what they want via the app, resulting in your phone or tablet reproducing the sounds within earshot of a device that will respond to them.
None of these can't work at all, for the exact same reason the TV/Radio attack mentionned above is severly limited :
Facetime isn't designed for dogs and bats (and dolphins).
Most of the modern internet applications for chat tend to use OPUS (e.g.: Skype, WhatsApp, Facebook, probably a few others).
This codec is optimized at carrying only audible sound/music/speech. As such the first step of OPUS is to kill all frequencies above 20kHz (no use to spend bits to encode stuff for which the ear lack any receptor. T
Re: (Score:2)
There's also the issue that the device won't respond back at the sub-audible frequency.
Re: (Score:3)
Exactly. If someone is exploiting this in my house, then it means they already broke in and have complete physical access to my house, screwing around with the Echo and maybe making fradulent Amazon orders or whatever would be the least of my concerns.
Yeah, I'm struggling to see the use case. Maybe a cloak-and-dagger situation where you have limited legitimate access under close scrutiny and want to plant a bug but can't do it physically, like say you're a fake inspector at a drug lord's house. All you have to do is make some pretext to walk past the device with the ultrasonic command playing and it'll go to some malware site and root itself. Pretty far fetched though...
Re:Needs physical access (Score:5, Insightful)
Exactly.
If by exactly you mean it is something completely different.
If someone is exploiting this in my house, then it means they already broke in and have complete physical access to my house,
Like if they embedded the audio in a youtube video that you were watching? That's basically equivalent to already having broken into your house and having run of the place right?
And what if they are exploiting it on the phone in your pocket... you do go out of the house right? Maybe you dont want the guy behind you at starbucks to prank you by getting your phone to set an alarm at 2am, or order you all 180 episodes of the Golden Girls.
screwing around with the Echo and maybe making fradulent Amazon orders or whatever would be the least of my concerns.
Or it could be the means to breaking in. Slip a tiny ultrasonic speaker under a door jam or window sill... and tell it to unlock and open the door, perhaps it even works by holding the speaker against the window glass. Not that your front door lock is a big obstacle to a would-be thief... but do you really want your house to roll out the welcome matt to every jackass with the means to play an aac file within hearing of your home?
Re: (Score:2)
Hey! What's wrong with the Golden Girls?
Audio bandwidth (Score:2)
Like if they embedded the audio in a youtube video that you were watching?
Bad news for this use case :
no matter what the meme says [wikipedia.org], internet wasn't build for dogs (Neither for bats nor dolphins)
As such most codecs used online are optimised for human hearing range.
- OPUS will filter out anything above 20kHz.
- AACplus only replicates spectrum from mid to high range.
etc.
And most audio sources only use 48kHz sampling rate (i.e.: up to 24kHz sounds anyway).
No way to hide secret message above the audible range : that range won't be carried.
And what if they are exploiting it on the phone in your pocket... you do go out of the house right? Maybe you dont want the guy behind you at starbucks to prank you by getting your phone to set an alarm at 2am, or order you all 180 episodes of the Golden Girls.
and when your pocket suddenly says "Okay, I'm
Re: (Score:2)
As such most codecs used online are optimised for human hearing range.
Bat call recorders predominantly use .wav, which does support it. So as long as you can get .wav your set; and you can put wav audio (as an LPCM format) into both AVI and MP4 containers... so I think it should be pretty doable.
And most audio sources only use 48kHz sampling rate (i.e.: up to 24kHz sounds anyway).
Lots of options for ultrasonic recording -- again... the whole bat call niche has you covered (although you don't really need ultrasonic recording, you'll probably just record yourself speaking normally, and then shift it to ultrasonic in software.
and when your pocket suddenly says "Okay, I'm buying 180 episodes of Golden Girls" confirmation, you're going to notice that something fishy is happening.
Maybe. Depends how loud it is, and h
Bats vs humans. (Score:2)
As such most codecs used online are optimised for human hearing range.
Bat call recorders predominantly use .wav, which does support it. So as long as you can get .wav your set; and you can put wav audio (as an LPCM format) into both AVI and MP4 containers... so I think it should be pretty doable.
And most audio sources only use 48kHz sampling rate (i.e.: up to 24kHz sounds anyway).
Lots of options for ultrasonic recording -- again... the whole bat call niche has you covered
I was answering to a different use case.
You were speaking about carrying the ultra sonics over youtube.
I'm pointing out that youtube has hard limitations preventing you from carrying ultra sonics.
Of course "custom device to blurt out ultra sonic" would work
(even a local app running on your high-range smartphone could probably switch the hardware in 96kHz audio out mode ?
Hey, we finally found a real-world use case for these 96kHz/192kHz audio out mode that the audiophile have insisted on having !~)
For that c
Re: (Score:2)
do you really want your house to roll out the welcome matt to every jackass with the means to play an aac file within hearing of your home?
I would hope that Matt would realize they aren't supposed to be in the house. Then again, his main job is welcoming people that Siri/Alexa tell him to welcome, so how smart can he really be?
Re: (Score:2)
That is more farfetched than the average CSI plot. Maybe you should be writing for TV instead of trolling for slavery.
Re: (Score:3)
Or it means they're outside your house while you're not home, with a loud enough ultrasonic sound for your Echo to hear through the wall.
Now your door is unlocked (because you were stupid enough to hook your door locks up to the internet and have them voice controlled).
Re:Needs physical access (Score:5, Interesting)
Someone posted a tale of woe on Twitter the other day. They bought a "smart" lock, controlled via an app on their phone. The phone uses nearby wifi APs to determine location without powering up the GPS. The guy has a portable wifi AP for use when travelling...
Every time he sets up his mobile AP, anywhere in the world, is house unlocks all its doors.
Re: (Score:2)
Your imagination needs work. What if someone uses speakers? Either connected to a PC or while using the TV as a screen? You could easily have these frequencies playing off a website or in presumed bumper space at the beginning of streaming video, clips on twitter or streamable, etc. They could be set to very high decibles compared to the rest of the clip, and since you can't hear in that range you would still have no idea it even played, even if your volume level was set to a normal amount.
Re:Needs physical access (Score:5, Interesting)
Re: (Score:2)
Hmm... poke a speaker through your letterbox?
Turn the amp up to 11 just outside your back door? (the neighbours won't hear it - it's ultrasonic)
Play sounds through an air vent, or open window too small to climb through?
Drill a hole in the wall or door?
Re: (Score:2)
What if a shop exploits it by commanding digital assistants of the passersby to open the special web-site or tweeting @ a special account — and entering whatever information the assistant knows, but the attacker does not (yet)?
Even if little such information exists, the attacker's ability to highjack the browser and show coupons/specials/etc. would a worrying development — and that's the most benign thing I can think of...
Re:Needs physical access (Score:4, Interesting)
1) Set up a personal 900 number
2) ???
3) Get on a PA system and broadcast the ultra-sonic message to call your 900 number
4) Profit!!!
The other exploit is step 3) just broadcast a normal audible message to call your 900 number
Re: (Score:2)
Spies and embassy workers wondering around whispering to another nations mil/gov contractors?
Imagine of an area in any nation filled with mil/gov contractors.
A thought experiment with trusted devices to be turned on outside secure working hours and a network of whispers waiting over a wide area.
Re: (Score:2)
You couldn't use a 900 number because that would lead back to you. But one of these nations with phone billing scams could use it to make computers call their phone network.
Re: (Score:3)
Not really. You just need remote access to something nearby with a speaker. In fact you don't even need remote access; you just need the target to play a specially prepared audio file on that speaker.
Re:Needs physical access (Score:4, Insightful)
Um, they just need to be in range of ultrasonic frequencies, which means this is exploitable anywhere on the same block as the building you're in. I hope if you live in an apartment complex all your neighbors are really really nice and trustworthy people who are close personal friends of yours.
Re: (Score:2)
There will be a quick fix, & congs to the Chin (Score:3)
... a team from Zhejiang University translated typical vocal commands into ultrasonic frequencies that are too high for the human ear to hear, but perfectly decipherable by the microphones and software powering our always-on voice assistants.
I extol the Chinese on this discovery; & let's also agree that there's likely to be a [quick] fix as it doesn't seem that complicated.
Re: (Score:2)
Fascinating information.
Re: (Score:2)
But, on the Internet, no one knows you're a dog.
Re:There will be a quick fix, & congs to the C (Score:4, Interesting)
I'm actually surprised it worked. I'd have expected one of the first things the device would do is filer out frequencies above and below human speech in order to remove as much background noise as possible. Anything ultrasonic should be discarded as it can only ever be noise, since no human can talk that high*.
* Except after getting kicked in the balls.
Re: (Score:3)
It seems this would have been filtered before the main processing, that so many programmers would have missed doing it seems incredibly unlikely. That "whispering" in ultrasonic frequencies would have any effect at all seems even more unlikely - if they claimed that blasting high volume ultrasonic sounds and using effects like beat tones that the microphones would detect it would seem possible at least.
Hovercraft full of eels (Score:3)
I am no sound engineer, but I don't think filtering high frequencies above speech would necessarily help their speech comprehension. Upper harmonics might well give hints to the module about the intended words. Second-language learners had more trouble understanding their non-native tongue over the old telephone networks, partly because of the filter on upper harmonics. POTS operators used the lowest bitrate they could get away with.
I assumed someone discovered a pattern to upper harmonics and is exploiting
Re: (Score:1)
I would think that both
1) Typical computer speakers wouldn't reproduce those frequencies well at all
and
2) Codecs wouldn't encode them in the first place
I'd like more info regarding iOS version (Score:2)
When Siri first came out, anyone could trigger "Hey Siri" if it was enabled. But starting with a later version of iOS (I don't remember exactly which one), you would train Siri to recognize your voice - and it seemed to work. I now can trigger my phone but not my wife's, for example. So I'm curious how this particular exploit could work on a reasonably current version of Siri.
Now the Apple Watch is another matter... and I don't recall if macOS Sierra does the voice pairing. But I'm somewhat skeptical about
Re:I'd like more info regarding iOS version (Score:5, Informative)
Who says Siri is that discriminating, even when dealing with a 'trained' voice?
The day after my wife got Siri all trained 'only' to recognize her, I could spoof her by simply talking out of the back of my throat and bumping my voice up a few octaves. I sound ridiculous, and nothing like my wife... but despite several re-trainings, I can still get her phone to do things she doesn't want.
Re:I'd like more info regarding iOS version (Score:4, Funny)
You misunderstand. This is Siri training you to talk in silly voices for its own amusement.
Re: (Score:2)
WTB Cap'n Crunch whistle PST
Not a big deal (Score:3, Informative)
Solution (hardware): RC low-pass filter.
Solution (software): fft low-pass filter.
bug fixed.
Re: (Score:3)
Re: (Score:2)
I don't think it's sampling that high - perhaps 48kHz or so. And it's only doing that because that's the default sampling rate of the codec or digital microphone in use. Changing the rate, especially of modern digital microphones can be complicated so it's usually easier to resample it later.
And its probably not ultrasonic - if you simply move human speech of around 500-4kHz
How it all started (Score:1)
"Alexa, kill all humans."
my time to shine (Score:2)
Re: (Score:3)
YAY! My useless superpower to hear up to around 30-35KHz will come in handy for things other than knowing if someone left a CRT television on! I can now detect "dolphin attacks" apparently.
and numerous AC/DC adapters, and faulty capacitors. And the fun of returning loud and obnoxious devices that a vendor can't hear.
Re: (Score:2)
and numerous AC/DC adapters, and faulty capacitors. And the fun of returning loud and obnoxious devices that a vendor can't hear.
It's OK, since we got LCDs the CRT whine has been replaced by a 60Hz hum that anyone can hear.
Re: (Score:2)
Most of us make do with ~20kHz hearing to detect the coil whine of CRTs, actually 16kHz is enough.
2600 (Score:2, Insightful)
Cap'n Crunch called, he wants his attack vector back.
Re: (Score:1)
Can they make them work well and be useful? (Score:2)
Maybe the hackers can make these voice assistants actually work well (i.e. Siri), and do something actually useful?
Always ... (Score:5, Funny)
[ I hope you all like creamed corn. ]
One would think... (Score:2)
Re: (Score:2)
Re:One would think... (Score:5, Informative)
That input to a voice recognition system would be run through a notch (bandpass) filter only a little wider than human vocal range.
The point of the attack is that they're using the nonlinearity of the mechanical microphone to "mix" the ultrasonic carrier and sidebands to produce "demodulated" audio on the microphone output. Though there is no "baseband" audio in the air, that demodulated audio IS baseband. So no amount of filtering will separate it from a real voice signal.
Re: (Score:2)
Re: (Score:2)
That said, there are probably not that many different mics used in phones, so tuning for a large s
wtf? (Score:2)
They could order an Amazon Echo to "open the backdoor."
If you're not home and someone says "open the back door" loud enough for Alexa to hear it, you've fucked yourself anyway.
Pro tip: Don't control your security system/door locks with a voice system anyone can use. You may as well have the doorbell unlock the door.
Try saying "Are you serious?" (Score:4, Interesting)
And suddenly, my iPhone — which was far across the room and plugged in — lit up and Siri asked me what I wanted.
Apparently, "Are you serious" sounds like "Hey, Siri."
Re: (Score:1)
Apparently, "Are you serious" sounds like "Hey, Siri."
Well, it does if you have a nasally american accent.
Re: (Score:2)
A few days ago, I happened to be reading something online and paused and said you myself aloud, "Are you serious?"
And suddenly, my iPhone — which was far across the room and plugged in — lit up and Siri asked me what I wanted.
Apparently, "Are you serious" sounds like "Hey, Siri."
I've had no luck reproducing this. I thought perhaps "Siri" would be enough on it's own (Since depending on pronunciation Serious has a "Siri" in it) but that didn't work either
I think the key is "far across the room." There may have been enough uncertainty at a distance with "Are You" but the phone recognized "Siri(ous)" and assumed it was a wake up call. Or Siri just though you were drunk again!
Re: (Score:2)
I've had no luck reproducing this. I thought perhaps "Siri" would be enough on it's own (Since depending on pronunciation Serious has a "Siri" in it) but that didn't work either
The way I said it was less like "Are ... you ... serious?" and more like "Aryoo searees?" That is, I said it very quickly and didn't enunciate.
I think the key is "far across the room." There may have been enough uncertainty at a distance with "Are You" but the phone recognized "Siri(ous)" and assumed it was a wake up call. Or Siri just though you were drunk again!
Well, I've repeated it several times and it does the same thing at any distance close or far.
Re: (Score:2)
A few days ago, I happened to be reading something online and paused and said you myself aloud, "Are you serious?" And suddenly, my iPhone — which was far across the room and plugged in — lit up and Siri asked me what I wanted. Apparently, "Are you serious" sounds like "Hey, Siri."
Yes, but, were they serious?
Easiest fix ever (Score:2)
Re: (Score:1)
then you'll be vulnerable to a Barry White attack.
The attack they've used is ultrasonic only, and uses harmonics to make the system 'think' it's hearing normal human voices, when actually there is none. Believe it or not, it's actually cleverer than you.
You can ultrasonically jam them too (Score:2)
I noticed that when I am running my ultrasonic cleaner, Siri becomes almost completely unable to recognize my words. It knows I am speaking and detects word breaks but the accuracy drops to the point of uselessness even 5-6 feet from the source of the sound.
I haven't checked but it should be running in the 35-40 KHz range.
How is this news... (Score:1)
You have a device that controls your home that responds to voice commands, and someone can "hack it" by giving it voice commands. How is this news?
BTW: This is why I don't have a voice activated system controlling my house / phone / computer / whatever.
Notice: Beep Boop Buup! (Score:1)