/*
DESCRIPTION:
Tracks anonymous patronage metrics by logging data at significant events.
The idea is for this data to be freely available and of no particular value
to anyone except to satisfy curiosity and help improve the service. Queries
on this table can tell you all sorts of things like how many thoughts are
matched within a given time period, match percentages with and without
hints, how long typical matches take, broad service error diagnostics. All
kinds of cool things! The table is meant to be ever-growing and periodically
archived, again eventually publically downloadable as well.
If made public, the rows might have to be shuffled in case by default they
are stored/served from the SQL database in the order they were written,
since this could let both parties identify each others' events if traffic is
low enough.
Eventually, if the service is popular enough and standard hashtags start to
get used, it would be nice to log the hashtag for each event. There would
have to be a whitelist of loggable hashtags in order to avoid logging
hashtags that themselves contain sensitive information, like if two people
use Thoughter with a prearranged secret "password"-type hashtag. And a
hashtag would probably only be logged for one party, not both, to make
association between two parties impossible or really hard.
Below you will find repeated reference to a "patronage funnel". This means
the progression of possible actions a Patron can cause or have happen to
them using the service. They are enumerated below in `event_action`.
A WEAKNESS:
A current weakness with this table is that, if the data is made public,
either intentionally or through leaks, then it may be possible for two
Patrons with a thought match to see what thought duration the other Patron
chose. I'm too tired to describe it fully, but it can be done through
deanonymization techniques using `event_timing`. The latter becomes less of
an issue as the service becomes more popular. But I leave further details as
an exercise to the reader.
There are several fixes that could work individually or together:
(1) Simply don't publicly release any data (intentionally).
(2) Obfuscate the data before publicly releasing.
(3) Get rid of the leak completely once this `event` table has enough data
that we don't really need to track anything related to thought duration.
(4) Simply don't care about the weakness since it's not a huge deal.
These fixes all have their various tradeoffs, including decreased accuracy,
consistency, and completeness of the event tracking.
*/
CREATE TABLE event (
/*
An event_id, AKA big random 8-byte integer, is generated and encrypted
into TLVs alongside other data in various tables like `thought`. It's then
copied here for every step in the patronage funnel enumerated in
`event_action`. It will thus be the same for every action for a given
Patron's experience through the funnel.
*/
event_id BIGINT(20) NOT NULL,
/*
The Patron actions (or reactions) we're interested in. There are a few
different combinations of flows through the funnel that can happen. The
actions are listed as chronologically as possible though.
For various reasons, all the `OPTOUT__THOUGHTER__*` actions are 1-funnel long with
their own `event_id` of zero.
*/
event_action ENUM (
'PUSH_THOUGHT',
'DELETE_THOUGHT',
'VERIFY_THOUGHTER_EMAIL',
'CHECK_PROOF',
'CHECK_THOUGHT_MATCH',
'ADD_THOUGHT_MATCH_NOTIFICATION_EMAIL',
'SEND_HINT',
'THOUGHT_MATCHED_EVENTUALLY',
'REMEMBER_ME',
'FORGET_ME__FROM_CHECK_THOUGHT_MATCH',
'FORGET_ME__FROM_THOUGHTER_FORM',
'FORGET_PAYMENT_INFO',
'SUBSCRIBE__NORMAL',
'SUBSCRIBE__FROM_CHECK_THOUGHT_MATCH',
'SUBSCRIBE__FROM_DONOR',
'ACTIVATE_SUBSCRIPTION',
'UNSUBSCRIBE',
'OPTOUT__THOUGHTER__GLOBAL__VERIFY_EMAIL',
'OPTOUT__THOUGHTER__GLOBAL__DUPLICATE_PUSH__EMAIL_NOT_YET_VERIFIED',
'OPTOUT__THOUGHTER__GLOBAL__DUPLICATE_PUSH__EMAIL_ALREADY_VERIFIED',
'OPTOUT__THOUGHTER__GLOBAL__HINT',
'OPTOUT__THOUGHTER__SPECIFIC__HINT',
'OPTOUT__THOUGHTER__UNKNOWN_SOURCE',
'OPTOUT__ONETIME_DONOR',
'OPTOUT__RECURRING_DONOR',
'DONATE',
'CANCEL_RECURRING_DONATION',
'SEND_EMAIL__NEW_CONSULTING_LEAD',
'SEND_EMAIL__THOUGHTER_VERIFICATION',
'SEND_EMAIL__HINT',
'SEND_EMAIL__THOUGHT_MATCH_NOTIFICATION',
'SEND_EMAIL__ACTIVATE_SUBSCRIPTION',
'SEND_EMAIL__SUBSCRIPTION_ALREADY_ACTIVE',
'SEND_EMAIL__DONOR_RECEIPT',
'SEND_EMAIL__ALREADY_DONATING',
'CRYPTO_CONSULTING__LANDING',
'CRYPTO_CONSULTING__CLICKTHROUGH'
) NOT NULL,
/*
A psuedo-enumeration of application result codes for the various actions
that a Patron can trigger. So this will be some kind of "success" status
hopefully most of the time, and otherwise indicate the overall type of
logical failure in the app. `event_status_info` may provide more
information, especially useful for exceptional failure situations.
TODO: This should be a proper SQL enumeration once the dust settles.
*/
event_status VARCHAR(128) NOT NULL,
/*
For now simply provides a way to add extra diagnostic information if
`event_status` indicates some kind of failure, especially a rare one. More
precisely, for now at least, this is the class name of the Java Exception
that was returned or thrown by a piece of application logic. Either that
or "NA" (not applicable).
There may be more stuff thrown in here in the future, but it of course
should never contain any personally identifiable information.
TODO: Make nullable.
*/
event_status_info VARCHAR(255) NOT NULL,
/*
For event_action.PUSH_THOUGHT this is an epoch timestamp, rounded to the
nearest hour as additional defense against deanonymization. Otherwise it
is the epoch time delta since the previous event in the funnel. This is
a tricky column as far as not leaking any unwanted information when this
table is made public. For example *just* using timestamps for each event
in a funnel would be simpler, but wouldn't allow us to do the rounding
obfuscation without losing considerable accuracy in the time deltas.
Also as a minor bonus I think storing time deltas makes queries easier to
write in general, since application code has already done some of the
"pre-processing". Some specific queries are harder with deltas of course.
In the future some or all of the deltas might be rounded to the nearest
second or minute or hour or *something* to further make any kind of
deanonymizing link between an individual Patron and their event funnel
harder. For now I'm keeping it exact to the millisecond because there's
too much value for improving the service so early after its release. And
anyway it becomes less of a concern as more people use the service and the
noise in the data makes it harder to identify yourself and the other
Patron by timing analysis.
Either way all this will be considered more as the service matures and
before this table is actually made public, since proofing metadata against
deanonymizing tactics is a tricky problem.
*/
event_interval_timing BIGINT(20) NOT NULL,
/*
The duration in nanoseconds that the logic behind the event took to
execute. Basically measures how long our SQL queries, emails, and crypto
are taking on a per-event basis. This column is so far "unique" in that
it represents nanoseconds, unlike other time-related columns in the
database as of this writing, which all represent milliseconds.
*/
event_logic_timing BIGINT(20) NOT NULL,
/*
An arbitrary integer value that is event-specific. For example for
`event_action.PUSH_THOUGHT` this will be the thought duration in
milliseconds.
TODO: Make nullable. Requires some Ikwyt improvements.
*/
event_integer_value BIGINT(20) NOT NULL,
/*
An arbitrary integer value that is event-specific. For example for
`event_action.SEND_HINT` this will be the hint type.
*/
event_string_value VARCHAR(255) DEFAULT NULL
);