Over at Newsblur there's mild uproar about items automatically being marked read after 14 days (https://getsatisfaction.com/newsblur/topics/do_unread_items_sunset_after_14_days). Google Reader used to mark items read after 30 days, so I'm guessing it's a hard/unsolved problem to store this data for a long time.
Has it been solved, and what's the most space-efficient way to store the per-item unread state for all users? In my mind it forms a sparse matrix, but then you'd want to group people's storage of unread status together (so you get some data compression) - until one person marks an item as read, and then you have to extract them from the compressed data set...
I have no practical experience but this intrigues me, and real world stories would be great!