Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
AI Iphone Privacy Security

Hackers Can Take Control of Siri and Alexa By Whispering To Them in Frequencies Humans Can't Hear (fastcodesign.com) 116

Chinese researchers have discovered a vulnerability in voice assistants from Apple, Google, Amazon, Microsoft, Samsung, and Huawei. It affects every iPhone and Macbook running Siri, any Galaxy phone, any PC running Windows 10, and even Amazon's Alexa assistant. From a report: Using a technique called the DolphinAttack, a team from Zhejiang University translated typical vocal commands into ultrasonic frequencies that are too high for the human ear to hear, but perfectly decipherable by the microphones and software powering our always-on voice assistants. This relatively simple translation process lets them take control of gadgets with just a few words uttered in frequencies none of us can hear. The researchers didn't just activate basic commands like "Hey Siri" or "Okay Google," though. They could also tell an iPhone to "call 1234567890" or tell an iPad to FaceTime the number. They could force a Macbook or a Nexus 7 to open a malicious website. They could order an Amazon Echo to "open the backdoor." Even an Audi Q3 could have its navigation system redirected to a new location. "Inaudible voice commands question the common design assumption that adversaries may at most try to manipulate a [voice assistant] vocally and can be detected by an alert user," the research team writes in a paper just accepted to the ACM Conference on Computer and Communications Security.
This discussion has been archived. No new comments can be posted.

Hackers Can Take Control of Siri and Alexa By Whispering To Them in Frequencies Humans Can't Hear

Comments Filter:
  • Hahahahahahah (Score:1, Insightful)

    by Anonymous Coward

    "our always-on voice assistants" -- the only thing that's always on is my refrigerator. Siri likes it when I press her button anyway. It would be interesting to do some electronic shoulder surfing at the airport though ... heh Band pass filter coming ASAP!

    • That moment where you're pinned to the floor with a cold iron shotgun barrel in your neck right as you board the plane "for calling in a bomb threat" ? now i remember why i keep my phonenumber in €15 dumbphone ... i knew there was something awry ... i just couldnt put my finger on it
  • by bogaboga ( 793279 ) on Wednesday September 06, 2017 @08:14PM (#55150781)

    ... a team from Zhejiang University translated typical vocal commands into ultrasonic frequencies that are too high for the human ear to hear, but perfectly decipherable by the microphones and software powering our always-on voice assistants.

    I extol the Chinese on this discovery; & let's also agree that there's likely to be a [quick] fix as it doesn't seem that complicated.

    • Fascinating information.

    • by msauve ( 701917 )
      " translated typical vocal commands into ultrasonic frequencies that are too high for the human ear to hear, but perfectly decipherable by the microphones and software powering our always-on voice assistants."

      But, on the Internet, no one knows you're a dog.
    • by AmiMoJo ( 196126 ) <mojo@NOSpam.world3.net> on Thursday September 07, 2017 @07:34AM (#55152451) Homepage Journal

      I'm actually surprised it worked. I'd have expected one of the first things the device would do is filer out frequencies above and below human speech in order to remove as much background noise as possible. Anything ultrasonic should be discarded as it can only ever be noise, since no human can talk that high*.

      * Except after getting kicked in the balls.

      • by Megol ( 3135005 )

        It seems this would have been filtered before the main processing, that so many programmers would have missed doing it seems incredibly unlikely. That "whispering" in ultrasonic frequencies would have any effect at all seems even more unlikely - if they claimed that blasting high volume ultrasonic sounds and using effects like beat tones that the microphones would detect it would seem possible at least.

      • I am no sound engineer, but I don't think filtering high frequencies above speech would necessarily help their speech comprehension. Upper harmonics might well give hints to the module about the intended words. Second-language learners had more trouble understanding their non-native tongue over the old telephone networks, partly because of the filter on upper harmonics. POTS operators used the lowest bitrate they could get away with.

        I assumed someone discovered a pattern to upper harmonics and is exploiting

      • I would think that both

        1) Typical computer speakers wouldn't reproduce those frequencies well at all
        and
        2) Codecs wouldn't encode them in the first place

  • When Siri first came out, anyone could trigger "Hey Siri" if it was enabled. But starting with a later version of iOS (I don't remember exactly which one), you would train Siri to recognize your voice - and it seemed to work. I now can trigger my phone but not my wife's, for example. So I'm curious how this particular exploit could work on a reasonably current version of Siri.

    Now the Apple Watch is another matter... and I don't recall if macOS Sierra does the voice pairing. But I'm somewhat skeptical about

  • Not a big deal (Score:3, Informative)

    by Anonymous Coward on Wednesday September 06, 2017 @08:19PM (#55150819)

    Solution (hardware): RC low-pass filter.
    Solution (software): fft low-pass filter.
    bug fixed.

  • "Alexa, kill all humans."

  • YAY! My useless superpower to hear up to around 30-35KHz will come in handy for things other than knowing if someone left a CRT television on! I can now detect "dolphin attacks" apparently.
    • YAY! My useless superpower to hear up to around 30-35KHz will come in handy for things other than knowing if someone left a CRT television on! I can now detect "dolphin attacks" apparently.

      and numerous AC/DC adapters, and faulty capacitors. And the fun of returning loud and obnoxious devices that a vendor can't hear.

      • and numerous AC/DC adapters, and faulty capacitors. And the fun of returning loud and obnoxious devices that a vendor can't hear.

        It's OK, since we got LCDs the CRT whine has been replaced by a 60Hz hum that anyone can hear.

    • by Megol ( 3135005 )

      Most of us make do with ~20kHz hearing to detect the coil whine of CRTs, actually 16kHz is enough.

  • 2600 (Score:2, Insightful)

    by Anonymous Coward

    Cap'n Crunch called, he wants his attack vector back.

    • 2600 Hz is well within audio range, and the systems were supposed to respond to said frequency, so basically nothing like this at all.
  • Maybe the hackers can make these voice assistants actually work well (i.e. Siri), and do something actually useful?

  • Always ... (Score:5, Funny)

    by fahrbot-bot ( 874524 ) on Wednesday September 06, 2017 @09:27PM (#55151123)

    ... Listening [xkcd.com]

    [ I hope you all like creamed corn. ]

  • That input to a voice recognition system would be run through a notch (bandpass) filter only a little wider than human vocal range. It just seems like such a simple way to help sanitize the input.
    • Wondering why they didn't do that in the first place to reduce background noise.
    • by Ungrounded Lightning ( 62228 ) on Wednesday September 06, 2017 @11:07PM (#55151433) Journal

      That input to a voice recognition system would be run through a notch (bandpass) filter only a little wider than human vocal range.

      The point of the attack is that they're using the nonlinearity of the mechanical microphone to "mix" the ultrasonic carrier and sidebands to produce "demodulated" audio on the microphone output. Though there is no "baseband" audio in the air, that demodulated audio IS baseband. So no amount of filtering will separate it from a real voice signal.

      • So, if the "wakeup" command woke up a plasma arc microphone [digilentinc.com] and only those further commands confirmed by both microphones are accepted, would that prevent this attack?
      • by wbr1 ( 2538558 )
        That is very interesting, thanks. I am not well versed in audio physics, but wouldn't this be reliant on the physical properties of the microphone. IE resonant frequencies, harmonics, etc... If that is correct it should not work on all microphones being they have different sizes, diaphragms, housings, etc. You would have to tune the ultrasonic signal to trigger the sideband/harmonic within a particular mic.

        That said, there are probably not that many different mics used in phones, so tuning for a large s

  • They could order an Amazon Echo to "open the backdoor."

    If you're not home and someone says "open the back door" loud enough for Alexa to hear it, you've fucked yourself anyway.

    Pro tip: Don't control your security system/door locks with a voice system anyone can use. You may as well have the doorbell unlock the door.

  • by flargleblarg ( 685368 ) on Wednesday September 06, 2017 @11:01PM (#55151417)
    A few days ago, I happened to be reading something online and paused and said you myself aloud, "Are you serious?"

    And suddenly, my iPhone — which was far across the room and plugged in — lit up and Siri asked me what I wanted.

    Apparently, "Are you serious" sounds like "Hey, Siri."
    • by Anonymous Coward

      Apparently, "Are you serious" sounds like "Hey, Siri."

      Well, it does if you have a nasally american accent.

    • A few days ago, I happened to be reading something online and paused and said you myself aloud, "Are you serious?"

      And suddenly, my iPhone — which was far across the room and plugged in — lit up and Siri asked me what I wanted.

      Apparently, "Are you serious" sounds like "Hey, Siri."

      I've had no luck reproducing this. I thought perhaps "Siri" would be enough on it's own (Since depending on pronunciation Serious has a "Siri" in it) but that didn't work either

      I think the key is "far across the room." There may have been enough uncertainty at a distance with "Are You" but the phone recognized "Siri(ous)" and assumed it was a wake up call. Or Siri just though you were drunk again!

      • I've had no luck reproducing this. I thought perhaps "Siri" would be enough on it's own (Since depending on pronunciation Serious has a "Siri" in it) but that didn't work either

        The way I said it was less like "Are ... you ... serious?" and more like "Aryoo searees?" That is, I said it very quickly and didn't enunciate.

        I think the key is "far across the room." There may have been enough uncertainty at a distance with "Are You" but the phone recognized "Siri(ous)" and assumed it was a wake up call. Or Siri just though you were drunk again!

        Well, I've repeated it several times and it does the same thing at any distance close or far.

    • A few days ago, I happened to be reading something online and paused and said you myself aloud, "Are you serious?" And suddenly, my iPhone — which was far across the room and plugged in — lit up and Siri asked me what I wanted. Apparently, "Are you serious" sounds like "Hey, Siri."

      Yes, but, were they serious?

  • Ignore all voice commands over say 500hz.
    • by Anonymous Coward

      then you'll be vulnerable to a Barry White attack.

      The attack they've used is ultrasonic only, and uses harmonics to make the system 'think' it's hearing normal human voices, when actually there is none. Believe it or not, it's actually cleverer than you.

  • I noticed that when I am running my ultrasonic cleaner, Siri becomes almost completely unable to recognize my words. It knows I am speaking and detects word breaks but the accuracy drops to the point of uselessness even 5-6 feet from the source of the sound.

    I haven't checked but it should be running in the 35-40 KHz range.

  • You have a device that controls your home that responds to voice commands, and someone can "hack it" by giving it voice commands. How is this news?

    BTW: This is why I don't have a voice activated system controlling my house / phone / computer / whatever.

  • Most of you here don't think, that if you transmit hey google or hey siri or hey alexa the assistants are like bump bamp (make their beeps to notify that they are indeed listening to you and you can talk) And that beeping is audible by owner and probably you will get feedback "ok, opening garage door" or such. So the owner will probably sure notice it. (eliminates youtube, voice chat or such), but if owner not home (what the garage door is about) then you could still have a tiny speaker laying around!

If you can't learn to do it well, learn to enjoy doing it badly.

Working...