Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Boy have I got a surprise for you. I was an engineer at a web analytics firm a decade ago and yes, ISPs have your web browsing data and are selling it left and right. Also apps, Cell phone companies, etc. Our company bought all that data. and when that wasn't enough, we created apps that collected even more. Every click and ajax request, etc.... timestamped.

Yes, there are analysts sifting through your browsing data (if you're lucky, vaguely anonymized). Yes, I heard countless stories of this data being abused and misused. I simply can't imagine it has gotten much better by now.



> ISPs have your web browsing data

Since you worked in this area: What specific things do they track, and by what technical mechanism? DNS requests? (Do they capture those that don't go to their servers?) IP addresses? HTTP snooping? Full HTTP (non-TLS) MITM?


I wasn't responsible for the data intake, but I know that the data was extensive, and always included time on page, full URL, other request information (often post stuff).

I know that HTTPS provided a technical hurdle that our company and data providers worked around after about 6 months.

My guess is that some MITM-type collection? Some data providers gave us IPs and some just gave us some Tokenized ID. I don't know if ISPs provided IPs, but probably not.

Note that we did lots of data linking. Let's say an ISP provided us your age, URL, and Timestamp. We would link that into another data provider that provided past purchases, URL, and Timestamp (shopping toolbar/plugins do this) to get a bigger picture of who you are.


>get a bigger picture of who you are.

Sorry if I'm reading too much into this, but are you saying this data being collected and sold contains PII?


Well, PII is a bit of a nebulous term. Some websites still transfer some signup/user info in url parameters or unencrypted responses. We would even see SSNs pop up now and then.

Most data being sold has some good faith effort to remove PII, but that's never 100% complete, and by utilizing multiple data sources, an industrious person or team could de-anonymize your data. We were mostly doing this type of work for segmentation and persona analysis. Targeting an individual was never a goal, but would not have been terribly difficult.

I'll give you an example. We might receive all urls a person visited. Many contain person information that would not be caught in usual PII filtering process: https://mail.google.com/mail/u/1/#search/my+viagra+prescript...


ISPs use a variety of techniques, such as deep packet inspection.

Many ISP support centers use commercial software [1] to display a detailed analysis of website usage whenever a customer calls in for support.

This includes the top websites visited, how much data was transferred for each site (yes, including those of a more salacious nature), the number and type of devices inside a home, nearby Wi-Fi networks, etc. [1]

This information can then be queried and used used for marketing purposes at the subscriber level (or to individuals within a dwelling).

[1] https://www.calix.com/content/calix/en/site-prod/library-htm...

[2] https://www.calix.com/compass/access-analyze.html

[3] https://www.calix.com/compass/consumer-connect-plus.html


They're tracking the TLS ones too. DNS & Server Name Indication reveals which site you're going to. DPI means they're watching the content of the HTTP pages you go to as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: