Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
AI Google Security Apple

Siri, Alexa, and Google Assistant Can Be Controlled By Inaudible Commands (venturebeat.com) 100

Apple's Siri, Amazon's Alexa, and Google's Assistant were meant to be controlled by live human voices, but all three AI assistants are susceptible to hidden commands undetectable to the human ear, researchers in China and the United States have discovered. From a report: The New York Times reports today that the assistants can be controlled using subsonic commands hidden in radio music, YouTube videos, or even white noise played over speakers, a potentially huge security risk for users. According to the report, the assistants can be made to dial phone numbers, launch websites, make purchases, and access smart home accessories -- such as door locks -- at the same time as human listeners are perceiving anything from completely different spoken text to recordings of music.

In some cases, assistants can be instructed to take pictures or send text messages, receiving commands from up to 25 feet away through a building's open windows. Researchers at Berkeley said that they can modestly alter audio files "to cancel out the sound that the speech recognition system was supposed to hear and replace it with a sound that would be transcribed differently by machines while being nearly undetectable to the human ear."

This discussion has been archived. No new comments can be posted.

Siri, Alexa, and Google Assistant Can Be Controlled By Inaudible Commands

Comments Filter:
  • by Anonymous Coward

    This is not "news" because it's not "new"

    It's been known since September 2017: https://www.infosecurity-magazine.com/news/ultrasonic-dolphinattack-hack-voice/

    Funny how the original research listed only Chinese researchers. Now, NYT attributes this researcher to some Berkley guys, which is highly inaccurate. The DolphinAttack was the sole creation of the Chinese research team.

    • Did you read the article on just jump on the fact that prior research in this area negates the latest findings?

      The article credits the Chinese teams for their research in 2016. However, this story references new and recently published research applicable to real world attacks using almost any audio source. Security implications of this ongoing research are worrisome.

  • by UnknownSoldier ( 67820 ) on Thursday May 10, 2018 @01:07PM (#56589718)

    I wonder how long before we get inaudiable malware / trolled -- Alexa add big hairy balls to my shopping list! [youtube.com]

  • of course it does (Score:5, Insightful)

    by vux984 ( 928602 ) on Thursday May 10, 2018 @01:10PM (#56589728)

    And really most of this stuff is just as bad even if it is audible. It just means one has to figure out when you aren't home before they hold a speaker up to your mail slot / under the door / up to a window.

    And how are they going to secure it? Voiceprints -- we already have software that can defeat voiceprinting with a small sample. Passwords? That you have to say aloud everytime you use the device? That's pretty much pointless.

    This type of technology is fundamentally broken and from what i can see so far, it cannot be fixed.

    • Re:of course it does (Score:4, Interesting)

      by skids ( 119237 ) on Thursday May 10, 2018 @01:25PM (#56589836) Homepage

      Some talented screenwriter could probably make a good movie screenplay out of a battle-royale between Siri and Alexa and Okaygoogle all trying to sabotage each other, meanwhile ruining the life of their owner. (And then get the companies to buy the rights so it'll never get shot)

      • Some talented screenwriter could probably make a good movie screenplay out of a battle-royale between Siri and Alexa and Okaygoogle

        And even if one doesn't, there's always George Lucas.

      • South Park already did it. And yes, it activated devices in people's homes.
      • by jiriw ( 444695 )

        Why do I have Pixar in mind? "Toy Story 5 - Electronic Warfare" featuring such lovely side-characters as baking assistant Aunt Alexa, Siri the drama queen and of course the gardener, Google Gnome.

    • by Anonymous Coward

      Google home mini recognizes individual voices, so private information will stay that way. Wouldn't be surprised if an "only registered voices" option comes around if this becomes too explored

      • by vux984 ( 928602 )

        We already have the technology to synthesize voices using a short sample.

        https://www.theverge.com/2017/... [theverge.com]

        What are you going to do when your voiceprint is hacked? Get a new voice?

        • by jiriw ( 444695 )

          To be fair, this is about inaudible commands who I doubt have a matching voice print with an existing human voice. Your 'problem' already was a 'problem'.

          • by vux984 ( 928602 )

            To be fair, the very first sentence in this thread:

            "And really most of this stuff is just as bad even if it is audible."

    • by green1 ( 322787 )

      Voiceprints aren't perfect, but they do a good job of defeating anything that's crafted to blanket a large number of users.

      If voiceprints are used, you couldn't for example, simply air a commercial on TV that makes millions of devices order a product.

      Basically it's a hugely effective method of blocking spam.

      That said, you are correct that it's basically useless against a determined attack on a specific individual, but so are door locks and I don't see people advocating that we should get rid of those.

      Securi

      • by vux984 ( 928602 )

        Security does not need to be, nor should it ever be, an all or nothing approach.

        100% Agreed.

        But the difference between a physical door and an amazon echo is that I absolutely do need a door and I absolutely don't need an amazon echo.

        So I absolutely do need a to balance security with effectiveness with convenience with expense... and voila we have various door locks.

        I don't need a voice assistant. And the convenience afforded by not having to reach for the remote to pause a movie or to not have to take my phone out of my pocket to dial it doesn't merit the kind of security compromises o

        • by green1 ( 322787 )

          If you use that logic we would have had no technological progress, ever. No invention ever solved a monumental problem on day one, everything has been incremental improvements to things over time. Nobody thought that we needed to have a computer in our pocket at all times and yet people really enjoy having that at this point,. This message is being composed entirely by voice. Something you say we don't need, and I'll agree we don't need it, that doesn't mean we don't want it, or that it doesn't improve our

          • Using your logic, civilization would collapse. GP listed serious problems with a certain technology that aren't currently fixed (even if they're fixable) and decided not to use it. GP said he didn't need voice control, not that he wouldn't want it if it were actually secure.

            • by green1 ( 322787 )

              His argument was that there was no use case for it, and that it would not be possible to secure it. That's very different from saying they want it to become more secure.

              • by vux984 ( 928602 )

                I didn't say there was no use case for it. I said the use cases were not important, and that the risk/security situation and compromises to use for its use cases don't make any sense.

                If, for example, you are paralyzed from the neck down, your situation is quite different, and the added convenience of voice commands to your quality of life makes it worth accepting the security risks. But if you are able bodied its absurd to accept the current security risks in exchange for the relatively trivial conveniences

  • by cyberchondriac ( 456626 ) on Thursday May 10, 2018 @01:14PM (#56589750) Journal

    TFA seems to indicate they believe this to be an unexpected and curious flaw in the software, but the fact that this works as well as it does, from up to 25 feet away, is inaudible to humans, and nearly all these PA devices can hear and respond to these types of ostensibly surreptitious commands.. well, maybe I'm paranoid, but maybe they just stumbled onto another NSA backdoor. Or even a Google/Apple/Amazon backdoor.
    I find this creepy and suspicious as hell.

    • by Carewolf ( 581105 ) on Thursday May 10, 2018 @01:17PM (#56589778) Homepage

      TFA seems to indicate they believe this to be an unexpected and curious flaw in the software, but the fact that this works as well as it does, from up to 25 feet away, is inaudible to humans, and nearly all these PA devices can hear and respond to these types of ostensibly surreptitious commands.. well, maybe I'm paranoid, but maybe they just stumbled onto another NSA backdoor. Or even a Google/Apple/Amazon backdoor.
      I find this creepy and suspicious as hell.

      No just a result of masquerading corporate spydevices as smart home devices with AI. They are not smart and they are not working for you.

  • by jittles ( 1613415 ) on Thursday May 10, 2018 @01:14PM (#56589758)

    Researchers at Berkeley said that they can modestly alter audio files "to cancel out the sound that the speech recognition system was supposed to hear and replace it with a sound that would be transcribed differently by machines while being nearly undetectable to the human ear."

    But did these so-called researchers see what Siri, Alexa, and Google Assistant do when they play the audio clip backwards? What kind of half-assed research is this?

  • Anyone know a good tool to play commands to Alexa in an inaudible range? My goals are mostly harmless.

    "Alexa Simon Says, Kids go do your homework!"

    That kind of thing.

  • by DogDude ( 805747 ) on Thursday May 10, 2018 @01:27PM (#56589870)
    They're already controlled by inaudible commands. Ethernet packets are silent. Do people think they "control" these things? How fucking stupid do you have to be to think that? Am I living in Douglas Adams's reality, where white mice are really running experiments on humans?
    • If you're aiming for humor I find it fell way short... your silent ethernet packets are aimed at the antenna, not the microphone, which is the subject of TFA.

      The phones are susceptible to silent control VIA THE MIKE.

      And as for white mice, I, for one, welcome our new Presidential Overlords, Pinky and the Brain. They've *got* to be better than what we've had since 1969!!!

      • His point is these devices are already controlled by the network and the mega corporations that control the device. Those corporations can instruct those devices to do whatever they wish. You don't "control" them, you just use them to get access to some of their functionality. I don't find that humorous myself.
        • by DogDude ( 805747 )
          Yup, that's what I meant, thanks.
        • by jetkust ( 596906 )
          Taking the word "control" out of the context of the article is mostly what I'm getting out of this take...
        • There's a difference between something that can be done by some large corporations that don't want to scare away customers, and something that can be done by anyone with a little technology from outside if your window is open.

    • Am I living in Douglas Adams's reality, where white mice are really running experiments on humans?

      Of course not.

      They're brown mice. Kind of a chestnut brown. The white mice thing was a ruse so you'd choose the wrong observers.

  • In voice recognition the first thing you usually do is applying filters to the signal removing anything below 1kHz and above somewhere of 8kHz or 10kHz.

    There is no way that there can me a sublime message in infra sound or ultrasonic sound.

    How would you actually "interpret it"? You would need a deliberated trojan horse/backdoor to translate a human voice sentence "transmitted" at infra sound into something the machine can interpret as a message, same for ultrasonic sounds. With infra sound you probably would

    • by gweihir ( 88907 )

      Indeed, it is not. The first thing you filter is anything that is not very close to the target signal. Yet the functionality seems to be there. Probably some preparation to have your smartphone or computer talk to them without you hearing it. That is creepy as hell.

    • by green1 ( 322787 )

      Near as I can tell from the poor explanations given, the sounds aren't actually inaudible, they're simply disguised. It's not that a human hears nothing while the device hears a command, it's that a human hears white noise, or music, or unrelated speech, and the device hears a command.

      Considering that computers and humans "hear" in very different ways, it's not really a surprise that you can craft an audio signal that sounds like one thing to a human, and yet sounds like something different to a computer.

      Wh

  • According to reports a man could be heard yelling the phrase "Alexa open the front door" shortly before the TV was noticed missing.

    A suspect was later apprehended with missing TV found in Frunk of his self-driving get away vehicle after it autonomously allided with an inanimate barrier.

  • by MindPrison ( 864299 ) on Thursday May 10, 2018 @01:47PM (#56590070) Journal

    Hi, former technician here.

    I've been constructing and building so many robotic, listening devices, radio communication devices that I have enough under the belt to tell you that you don't really need to worry TOO much about all of that, at least not for now, here's why:

    1) For this to be at all possible, the devices involved must meet a range of technical specifications and capabilities. For example, you have a mobile speaker that is specced to work within 20 hz to 20KHz, most of these will fail above 10KHz anyway, and you don't need them to be better than that, for its purpose, headphones however - is an entirely different case.

    2) I've tested numerous microphones so small we're talking 2-3 mm size, and most of these failed to pick up frequencies above 20KHz. As a young person, you could potentially hear up to 24KHz (I could pick up 23KHz sounds when I was 18 and worked in an electronics store, we tested with a Function Generator and a Piezo speaker specced well above 28KHz). Today I can pick up around 16.5-17KHz, which is not bad for my age, but on the plus side, I don't need expensive headphones anymore.

    3) We're talking inaudible sounds to the human ears here, therefor we're above the 20KHz range, to be entirely safe - we should be above 25KHz for this, very few phones, televisions, computer speakers and whatnot are capable of vibrating or picking up vibrations at those speeds, therefor this kind of communication in that frequency spectrum would fail drastically.

    What you COULD do tho, is that you use the upper audible frequency spectrum of say just above 10KHz and mix it with existing sounds, time it correctly with proper known synchronization (remember the old modems and their sounds? Now imagine a much higher pitch) - and albeit quite slow, it would still be possible to use it to trigger commands, communicate short messages etc. Anything needing more bandwidth than this would be impractical. You wouldn't hear this, albeit the sound technically would be possible to pick up if it was too long, but if just a split second there, in sequence not spaced too close, you'd be able to get away with it, possibly disguised by music or voice, but you'd still need some form of "trigger" sequence to pick it up and start reading, otherwise you'd get timing errors. Kinda like "fast morsecode" if you like.

    If you're worried about eavesdropping, you should be far more concerned with your home's windows - those are like giant eardrums, and light hitting those will create a small vibration of the reflected light, this tech has been known for years, you just don't hear about it very often.

    • by green1 ( 322787 )

      I think your missing the real attack. It doesn't seem like things are inaudible, but more that they're disguised as other sounds. Being that computers and humans "hear" very differently, it's not really a surprise that you can craft a sound that would sound like one thing to a computer, but something else to a human listener.

      Something trivially solved with voiceprints, a several decade old technology.

      • Something trivially solved with voiceprints, a several decade old technology.

        Very true.

        Another thing I was playing with here the other day, was the ability to use the phones ever increasing high resolution cameras as listening devices, when the phones are left on the table, or perhaps in a charging docking station, cameras (or a small addressable area of interest) could be used to record vibration of surrounding objects which can in turn be modulated into sound.

  • The only thing that will disable this is cutting power to the internal microphone. Windows themselves are one of the ways we used to "hear" conversations, typing (which can also be picked up by your cellphone and any device with a microphone, as well as nearby vibration sensors in your cellphone).

    Even inaudible humming frequently can be translated.

    Just don't install devices in your tin foil shielded and sound baffled escape room, and make sure it's not just airgapped but it's also without fans.

    (thinks about

  • by The MAZZTer ( 911996 ) <megazzt.gmail@com> on Thursday May 10, 2018 @02:52PM (#56590590) Homepage
    The basic form of this problem was solved long ago by using user accounts and permissions to give everyone their own preferences and storage spaces and dictate who has access to what resources. It just needs to be extended to these assistant devices by using voice recognition. Then any attack would have to be personalized for you which solves any attack trying to throw a wide net. Personalized attacks would have to be addressed by having the assistant verify it sounds like a real voice by a previously-identifed user and not a synthetic voice that's been shifted into an inaudible range or whatever.
  • Comment removed based on user account deletion
  • vibrator to earthquake
  • My Pixel 2 can't even hear me when it's in my pocket, so I'm not overly concerned

  • I'll buy all sorts of gadgets. I'm not opposed to smart-anything. But one thing I will never own is a device with a constantly listening microphone. I've tested the Alexa and it's actually scary how perceptive it is. If I whisper "alexa...", under my breath in another room, it lights up. Volume is irrelevant so long as the speech is clear. If it can hear that, what else can it hear? Everything. By design, the microphone is constantly on. You can argue that it's not always recording, but it is on, and that's

Think of it! With VLSI we can pack 100 ENIACs in 1 sq. cm.!

Working...