Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Privacy IOS Software Apple

Siri Keeps Your Data For Two Years 124

New submitter LeadSongDog writes with news that Apple has provided information on how long it holds onto voice search data used by its digital assistant software Siri. Speaking to Wired, an Apple representative said the data is kept for two years after the initial query. "Here’s what happens. Whenever you speak into Apple’s voice activated personal digital assistant, it ships it off to Apple’s data farm for analysis. Apple generates a random numbers to represent the user and it associates the voice files with that number. This number — not your Apple user ID or email address — represents you as far as Siri’s back-end voice analysis system is concerned. Once the voice recording is six months old, Apple “disassociates” your user number from the clip, deleting the number from the voice file. But it keeps these disassociated files for up to 18 more months for testing and product improvement purposes." This information came in response to requests for clarification of Siri's privacy policy, which was not very clear as written. The director of privacy group Big Brother Watch said, "There needs to be a very high justification for retaining such intrusive data for longer than is absolutely necessary to provide the service."
This discussion has been archived. No new comments can be posted.

Siri Keeps Your Data For Two Years

Comments Filter:
  • Backups (Score:2, Insightful)

    by Anonymous Coward

    How long are the backups of these systems kept for? Do they require a subpoena to get those backups, or will Apple cheerfully hand it over to any agency that asks?

    • or will Apple cheerfully hand it over to any agency that asks?

      If you need to ask, you can probably already figure out the answer.

      • Well, considering how the DEA is complaining that they can't read encrypted iMessages, and Apple got rid of google maps as default partly because google kept demanding more personallly identifying user data, I don't think we should assume Apple always rolls over on stuff like this.

        • > "google kept demanding more personally identifying user data"

          Do you actually have any evidence to back that up?

          As far as I know, Google wasn't getting much out of the deal. Apple was getting all of Google's map data and Google was getting some data about traffic patterns. Google wanted more branding. Apple wanted turn-by-turn directions, something that Android used to distinguish itself as better than Apple.

          Where are you getting this story that Google was demanding more personal info from Apple?

  • by Anonymous Coward
    Big Brother will come along at least once during that period so you can rest assured knowing it's stored for eternity.
  • is unfortunately in the eye of the beholder... The US government's reliance on it's ability to access private data has helped so much with the Boston suspects, we will wrest these gains into the intrusion of privacy from their cold, dead hands.
    • Re: (Score:1, Informative)

      by Anonymous Coward

      Joking I hope? We have no idea how they claim to have found these guys yet.

      The Government has relied on people turning in pictures and information "as far as we know" and did not find these guys by spying. I'm not claiming the Govt won't use that as an excuse, I'm saying it's untrue so you should not buy it if they do.

  • by Anonymous Coward
    "Siri, how much fuel oil should I mix with 25 pounds of ammonium nitrate?"
    • Re: (Score:2, Funny)

      by Anonymous Coward

      about a pint, but it's not critical. And AN comes in 50 pound sacks. 25 lb marks you as a newb.
      But nitromethane works better.

      • Re: (Score:2, Funny)

        by Anonymous Coward

        What if he's making one for now and one for later?

    • Siri says "Let me check for the answer.. while I inform the FBI of your request."

  • by Anubis IV ( 1279820 ) on Friday April 19, 2013 @04:00PM (#43498237)

    Anyone have the timeline for Google's disassociation and destruction of search queries? I'm curious how Apple's policies compare against those.

    • Well, you can disable Google saving your search at all... so there is that.

      • by fazey ( 2806709 ) on Friday April 19, 2013 @04:19PM (#43498421)
        You mean google has an option to hide your search history from you?
      • by Anubis IV ( 1279820 ) on Friday April 19, 2013 @04:20PM (#43498439)

        From what I can tell, disabling Google History doesn't seem to come with a promise that Google doesn't keep that data somewhere else. What they say they'll do is stop using your History to present targeted advertising for you across their services, or you can choose to delete individual items from your search history, that way they aren't considered when it comes to determining your interests and the like. What they very carefully seem to avoid saying is that they completely delete your queries from all of their systems, so I wouldn't be surprised if they're still using them in some sort of anonymized form for product improvement purposes, tracking trends, or other things of that sort.

        • Re: (Score:3, Insightful)

          by sqrt(2) ( 786011 )

          Actually, turning off search history doesn't even do as much as you say. They still use everything you enter into their services, every keystroke, how long you spent looking at a page, when you searched and from where. They use all of that and more to target ads (which many of us never see anyway thanks to Adblock Plus).

          Turning off search history hides this data from YOU. They still have it. They still have it associated with your account, and even if you are logged out it's associated with your IP address.

          • Google ads are white listed for me.
            Plain text. On topic. Unintrusive and helps out the company giving me good free shit.
            Every great once in a while I actually click on an ad because it is something I want.

            Of all ads on the internet. Google ads cause me the least pain.

            • by sqrt(2) ( 786011 ) on Friday April 19, 2013 @06:01PM (#43499287) Journal

              Perfectly reasonable. Myself, I've never seen an advertisement that was legitimately helpful. I'm dubious that there ever could be such a thing because advertising is fundamentally an adversarial relationship between the advertiser and the target of the ad (you): you have money that you want to keep, or get the most value for when you do spend it; they want to give you as little as possible while taking as much of your money as they can. You are fighting each other, you have competing interests. You can see why there's a huge incentive for them to lie, or get as close to lying as they legally can, and emotionally manipulate you in their pursuit of your dollars. I find attempts at such manipulation repugnant, which is probably why I walk around most of the day with a mild nauseated sensation. Still, I'd choose that over the syrupy haze of blissful ignorance.

              Google's official ads might be the least intrusive, but their disguised ads are rather pernicious, IMO. For example, every product you are shown when using Google Shopping is a paid product advertisement, every single product. They are ALL ads, and nowhere is this disclosed clearly. They are trying to pass it off as a store like Amazon (which has plenty of hidden ads too, but they at least make a passing nod towards identifying them) but it's more like the yellow pages. You have to pay Google for your product to appear there.

              • I'd mod you up but I blew the opportunity when I made a stupid comment above...
              • by TheLink ( 130905 )

                And I on the other hand have seen many advertisements that have been entertaining, amusing, funny and interesting.

                That's legitimately helpful enough for me, even if I never intend to buy their products:
                https://www.youtube.com/watch?v=vDGrfhJH1P4 [youtube.com]
                https://www.youtube.com/watch?v=WDncfptDjPU [youtube.com]
                https://www.youtube.com/watch?v=IJNR2EpS0jw [youtube.com]

              • Google had to make a choice with Shopping.

                1) Allow absolutely everyone to register for free. Then put up with all the spammers who place fake products and prices in order to get people to their sites.
                2) Charge a tiny fee that prevents spammers from overrunning the place.

                This was similar to the idea that charging a penny per email would run spammers out of business. Only, there is no way to charge for email, given the way the Internet works. But, there was the opportunity to do it with Shopping.

                It's not l

          • "or use Duckduckgo"

            here's another one to use; I've been using it for about a month and like it. Combines ixquick with Google results, and offers additional goodness, such as SSL, no cookies, proxy. (One search engine I miss is Kartoo - if it was still around it would be great along with this kind of anonymized, trackless search.) It also avoids handing over referrer info - which can be used to track you regardless of IP, depending on your settings.

            https://startpage.com/ [startpage.com]

        • From what I can tell, disabling Google History doesn't seem to come with a promise that Google doesn't keep that data somewhere else.

          I am pretty sure, based on experiences I've spelled out on Slashdot before, Google never actually deletes anything. When you select "delete" they basically just make it unavailable to you.

      • Well, you can disable Google saving your search at all... so there is that.

        No you can't. You *may* be able to stop them associating them with your account. But they still save the searches.

      • No, you have the option of not tying your google searches to a user a account and a specific name but if you think that save you then you don't belong in public.
    • by Anubis IV ( 1279820 ) on Friday April 19, 2013 @04:27PM (#43498483)

      Well, I've been searching since I made the comment, and the best I've found so far is this thread [google.com] where a Google rep confirms that for every image search they keep a thumbnail of the item that was clicked on, as well as the IP address for 9 months (after which it gets anonymized), and identifying information for the cookie associated with you for 18 months (after which it gets anonymized and the IP address gets partially destroyed). What that means is that they never fully destroy the data, and that if the query was self-identifying in some way, someone could still tie all of the queries you made together since they would still be associated with the cookie data, even if that cookie data is no longer associated with you.

      Take it with a grain of salt, however, since that's from back in 2011. As we all know, these tech companies have made big strides to protect our privacy better since then. Wait, no, I have that backwards.

    • Anyone have the timeline for Google's disassociation and destruction of search queries? I'm curious how Apple's policies compare against those.

      Well Google saves your searches against your Google account if you have one. And they save them for years. So it's a whole different ballgame.

    • Anyone have the timeline for Google's disassociation and destruction of search queries? I'm curious how Apple's policies compare against those.

      Going by this [slashdot.org], not until somebody actually forces them.

  • by Maxwell ( 13985 ) on Friday April 19, 2013 @04:02PM (#43498261) Homepage

    My guess is the overlap between "people who complained Siri wasn't accurate" and "people who dont want apple keeping any Siri data so they can make it better" is pretty close to perfect.

    Google reads your mail. Apple listens to your ravings. Don't like it, don't use it. And they only keep 'your' (ie identifable) data 6 months.

    • by nine-times ( 778537 ) <nine.times@gmail.com> on Friday April 19, 2013 @05:18PM (#43498983) Homepage
      Yeah, I find myself not minding this so much. I do think electronic records should somehow "sunset" at some point, even if it's after a few years, for various reasons. However, I don't see what the big deal is whether Apple retains the data for 1 month vs. 6 months vs. 2 years.

      When I used Siri for the first time and realized it was sending my questions to a datacenter somewhere, I had an immediate reaction of "that's a bit creepy and disconcerting." But once the data is sent out to the datacenter for processing, you've already opened the door for the data to be misused. Once you assume that the data will be stored for some amount of time, you increase the chances for the data to be misused. But if you extend the time that the data is stored for a for months or a year, I don't feel like you're greatly increasing your exposure.

      What holding on the data actually does is it gives Apple some time to process and analyze the data, improving the speech recognition and heuristic models. I'd expect them to want to keep it for a couple years, especially since Siri is new and they're probably still developing their methods for analyzing the data. In this sort of situation, having more data means being able to create a more accurate analysis.

    • Well...it's a voice recording, so it's still somewhat identifiable even if they don't store any further information with it...

      • Yeah, and it's a lot easier to change your IP than your voice....

        even if they eventually remove all personally-identifying info from a query, getting a voice match on all your searches will last as long as they keep them.

  • It's becoming exceedingly difficult to keep your search history private. All the major search companies keep it, Apple keeps Siri searches, etc. DuckDuckGo I believe keeps things as anonymous as you can get. There are also some hacks you can do if you are careful, privacy mode/ incognito is a start, but even then it's easy to tip your hand. If you are truly doing something crazy, use a bootable USB and do your searches from a random public wifi hotspot.
    • by tftp ( 111690 )

      StartPage [startpage.com]

      • Your ISP will rat you out.

        • by tftp ( 111690 )

          Your ISP will rat you out.

          Pray tell how, unless the ISP is capable of a MITM attack on an SSL connection.

      • by cffrost ( 885375 )

        StartPage [startpage.com]

        (Also known as Ixquick [ixquick.com]) is good, as is DuckDuckGo [duckduckgo.com], for those who value privacy.

  • by tuppe666 ( 904118 )

    I am getting tired of Apples continuing Privacy abused, first they sell their customers to the highest bidder now this.

    Even Siri was ruined with advertising http://www.inquisitr.com/256025/steve-wozniak-says-apple-ruined-siri-technology-after-acquisition/ [inquisitr.com] "Steve says he initially loved Siri because it could accurately answer questions such as “What are the five largest lakes in California?” and “What are the prime numbers greater than 87?” . To which Wozniak replied, “It’

    • Wolfram Alpha answers both questions accurately. I don't know why anyone would prefer Siri over other tools for answering encyclopedic questions.

      • by Megahard ( 1053072 ) on Friday April 19, 2013 @04:22PM (#43498447)

        I just tried it with Siri and it also punts to Wolfram Alpha so the answers are identical. There's no lakefront properties.

    • I just tried both of those and was given correct answers with no ads. The prime number question gave results from Wolfram Alpha.
    • Re: (Score:3, Insightful)

      by Nidi62 ( 1525137 )

      question about prime numbers now displays information about prime ribs."

      In Siri's defense, prime rib is pretty damn awesome

    • by sosume ( 680416 )

      At least they provide a longer data retention than guarantee on their products.

    • I am getting tired of Apples continuing Privacy abused, first they sell their customers to the highest bidder now this.

      Honest question: when did Apple sell anything related to their customers to the highest bidder? I can't find any information about anything along those lines, yet I've seen you repeat it at least twice in here.

      • by tftp ( 111690 )

        Honest question: when did Apple sell anything related to their customers to the highest bidder? I can't find any information about anything along those lines

        As if you would normally find information about such transactions plastered all over the town? As if you'd normally find any business contract between corporations published for everyone to see?

        These deals are signed in boardrooms, by VPs and above, and they stay among that crowd. Even if an IT worker at some point sets up a link between databases,

        • the fact that there is no proof of transactions must be EVIDENCE that such transactions are occuring behind closed doors! Quick, someone fetch my my tinfoil hat!
        • by node 3 ( 115640 )

          In other words, it's just entirely made up. Thanks for the clarification.

        • He's claiming elsewhere that it was a $400B deal.

          Not only would a deal like that have to be disclosed in SEC filings (i.e. it wouldn't stay private), but if Apple had sold their customer's data to Google for that price, it would have bankrupted Google dozens or hundreds of times over, since they have nowhere close to that much money on hand. Google's market cap is only around $250B at the moment, so Google could literally sell itself to Apple (assuming it magically gained control of those shares) and still

          • by tftp ( 111690 )

            He's claiming elsewhere that it was a $400B deal.

            That is indeed ridiculous. I would easily accept a $40M deal. A $400M deal would be already very hard to imagine; many *companies* aren't worth that much. Normally a CEO can work with 20-30 million USD with relative ease - such as acquire small companies or making deals of this sort; but anything beyond that triggers a completely different set of procedures.

    • Actually, my favorite Siri-choke is sunrise/sunset.

      Ask Siri "What time is sunset?" and Siri will tell you. Ask Siri "What time was sunrise?" and Siri will say something to the effect that it can't tell you the weather in the past. Ask Siri "What time will sunset be next Tuesday?" and it will say something to the effect that it doesn't know how to get the weather that far ahead.

      Huh? What does sunset have to do with weather? Well, Siri gets sunrise/sunset information from the same place as the weather. S

  • Sample data... (Score:5, Interesting)

    by sl3xd ( 111641 ) on Friday April 19, 2013 @04:23PM (#43498457) Journal

    Everyone I've ever spoken to or read about in the field of voice recognition tells me that having samples of people's voices is critical to improving it... and getting those samples (mainly the raw quantity of samples) is the biggest problem they face.

    So it doesn’t surprise me at all that anyone keeps a massive archive of samples... the sample data can be critical in improving voice recognition.

    As an aside: Google Voice's voice mail feature does more or less the same thing... and the reasoning is the same also: More sample data means better voice recognition.

    I can't help but shake my head at the comparison:

    Google samples user voices, reads (and transcribes) voice mail, reads your email, your stock information and then feeds it into their advertising engine, and does this for four years and counting; reaction: Meh...

    Apple samples voices, anonymizes it, uses it it improve voice recognition over a period of two years; reaction: EVIL! APPLE MUST DIE!

    • Re: (Score:2, Interesting)

      Anonymized voice sample you say? "Voice Print Identified" I say. Hell, I create my own image and speach recognition software from scratch, and I don't need all those fucking samples. I just need to run the samples through my algorithms at most twice -- Once, then again to test if the changes were beneficial or not. If I have a constant stream of users (new samples), and I'm smart -- read: Not fucking daft -- then I can just run the samples through once, and let the users of the system rate the samples

      • Re:Sample data... (Score:4, Interesting)

        by sl3xd ( 111641 ) on Friday April 19, 2013 @06:21PM (#43499433) Journal

        Voice prints are a real thing, of course; my point isn't that it's not possible to identify people from a voice sample.

        My point is that Apple doesn't make its money by selling you, me, and everyone else to the highest bidder, nor does its business have any real advantage in profiling us. Apple's business isn't advertising, it's selling hardware. (The flop that is iAd notwithstanding)

        Google, on the other hand, is entirely different: Their entire revenue stream is from collecting our personal information, categorizing and analyzing it, and then selling or otherwise making that data useful to its actual customers, ie. its advertisers.

        Hell, I create my own image and speach recognition software from scratch, and I don't need all those fucking samples. I just need to run the samples through my algorithms at most twice -- Once, then again to test if the changes were beneficial or not

        If you honestly believe that, then you've never spent even a minute actually learning the basics of speech recognition, let alone the level of complexity involved in modern algorithms. Signal processing isn't like database programming, where you get a nice result that fits into a box, and can easily reduce unwanted side effects.

        Also keep in mind, there's a difference between "automatic speech recognition" - where whole sentences are parsed and understood (such as used with Siri or Google , versus "discrete speech recognition" where very limited actions are understood (like older cell phones when you spoke "dial ").

        The problem is that while you might have improved the recognition for one specific sample, you've now made it considerably worse for another... so you have to build up a massive library of samples to do regression testing. One of the biggest challenges in speech recognition over the years is the utter lack of sample data for a wide populace, coupled with computers that are unable to hold enough samples in memory to do any meaningful comparisons.

        We've only recently started to see speech recognition of that calibre, and even then, it's accomplished by sending a recording off to a datacenter so fraking huge that it'd easily sit at the top of the TOP500 supercomputer list if their owners bothered to run linpack on it. It's no coincidence that it's also only been in the past couple of years speech recognition has become anything more than a lame joke.

    • by jonwil ( 467024 )

      The issue isn't that they retain the voice samples, its that they store user information for 6 months when they dont need to store user information for longer than it takes to complete the query and return the results.

      • by mattr ( 78516 )

        well they probably would want to keep the data as a corpus of text that can be further analyzed or used to guide further searches. It's just that it can be quite abused... and many people these days would rather have the data deleted immediately rather than improve a service that is less than crucial to one's life, so far.

  • In reference to an earlier question about Google's data retention policies, one of the comments [slashdot.org] provided a great link to a 2007 Google blog post [blogspot.co.uk] that describes why Google holds onto their data for 18 months before they anonymize it. One of the interesting things that was said was:

    However, we must point out that future data retention laws may obligate us to raise the retention period to 24 months.

    Given that the blog post was written back in 2007, isn't it now possible that 24 months is simply the earliest that a company like Apple is allowed to delete the query, given the various data retention regulations that are in place

  • If I were in charge of Siri, I'd do the same thing. That kind of real-world data is vital for regression testing. If you don't have a strong corpus of sample data, when you make changes to the code, you've got no idea if what you are doing is improving the situation for some cases, while damaging them for others. You would see people complaining about things like "Well Siri used to work for X query but now it doesn't". When you have this data, you can update the code, run the test suite, and see if it

    • That kind of real-world data is vital for regression testing. If you don't have a strong corpus of sample data, when you make changes to the code, you've got no idea if what you are doing is improving the situation for some cases, while damaging them for others

      Aaaand, unless you run ALL those data samples back through the system in front of a HUMAN, then you STILL have "no idea if what you are doing is improving the situation" at all. So, the point still stands: Keeping a sampling of the data is acceptable. Keeping the lot of it isn't helping anyone you actually want to help -- Least of all the developers. Hell, they could improve the service immensely by simply dropping the data storage requirments!

      The reason they keep this data is not to improve the fucki

      • by Bogtha ( 906264 )

        Aaaand, unless you run ALL those data samples back through the system in front of a HUMAN, then you STILL have "no idea if what you are doing is improving the situation" at all.

        Yes, you do. Have you ever used Siri? There are several places where you can reliably determine that recognition was successful, due to manual confirmation or subsequent actions. For instance, if I ask Siri to remind me to do something at 9 o'clock, it might ask me if I mean 9am or 9pm. Anybody who answers either way instead

    • by mysidia ( 191772 )

      If I were in charge of Siri, I'd do the same thing.

      And I suppose, if you were writing a web browser, it would upload screenshots of sites visited, to help your team ensure proper rendering?

      I think the point is not that the recordings are useful (or not), but that it is invasive to record voices talking to Siri.

      And especially since it is not well advertised -- the argument can easily be made that not everyone has necessarily given their consent (especially, if, for example, a friend uses your pho

  • Don't like it? Don't use it.
    • Don't like it? Don't use it.

      I don't, and I don't. But other people I care about do. Should I ignore the misdeeds simply because the victim isn't me?

      • by TheLink ( 130905 )
        But are they really victims? We can tell them how much info Apple/Google/etc gathers on them but if they don't care or think its worth it what's the big problem? Most people don't care about such stuff.

        It's like a friend eating his favourite fried chicken at his favourite dining place. It's bad for his health but is he a victim?
  • A "high justification"? How about speech recognition that actually works?

    Training speech recognisers requires data. The biggest reason why speech recognition has improved in the recent years: lots of data.

    Speech recognition in the cloud has given companies like Apple and Google a reason/excuse to gather masses of training data. They have put it to good use: speech recognition is much better than it was. If you like speech recognition, use it, meanwhile donating your data and helping the rest of us. If you d

  • somewhere in a data warehouse with only a few humans, there are millions of disassociated voices crying out to be heard. "But it keeps these disassociated files for up to 18 more months for testing and product improvement purposes."
  • As the map you voice recordings to an id and map that id to your apple id, I find it very strange they can claim it's anonymous!

    From a better article:
    http://arstechnica.com/apple/2013/04/apple-remembers-where-you-wanted-to-get-drunk-for-up-to-2-years/ [arstechnica.com]

    Muller pointed out, however, that the identifiers are deleted immediately—"along with any associated data"—when a user turns Siri off on his or her device. (You can do this by going to Settings > General > Siri on a supported iOS device.)

    If yo

The truth of a proposition has nothing to do with its credibility. And vice versa.

Working...