April 17, 2020

Data Enrichment 2.0


When we started working on data enrichment at Synapse, our goal was to ensure that no one has to ever search “What’s this charge” again for their card transactions.

With V1 Data Enrichment, we were able to accomplish that for 92% of the transactions that were being conducted through our Card Issuance product.

This is the kind of information we were able to return with our V1 product.

With the data, developers are able to build a beautiful transaction feed for consumers so that they can identify merchants effortlessly while scrolling through their feed. Each transaction includes the logo and a clean name without ancillary data.

Example Transaction Feed Leveraging Enrichment

With V2, our goal is to accomplish the following:

  1. Improve coverage and accuracy
  2. Add more contextual information like location, entity and payment processor details.
  3. Set the stage for financial advice

Here is how the new enriched information looks

Details of each provided profile can be obtained through the separate API. This allows us to keep all the profiles up to date, prevent overflowing the transaction objects with excessive data and add new knowledge features without changing the transaction schema. The information looks like this:

Under the hood

Enrichment 2.0 adopts a one-to-many relationship between transactions and business profiles. Instead of searching for the best merchant, we allow the transaction to be matched to a business profile on any of 3 different scopes:

  1. Entity: Corporate entity delivering a service/product to the user. (e.g. Netflix, Starbucks, Spotify, etc.)
  2. Facilitator: Third party that facilitates a payment to the entity. (e.g. PayPal, Square, Venmo, etc.)
  3. Location: Specific retail location. (Blue Bottle Coffee at 168 2nd St.)

We attempt to achieve the best enrichment for each transaction in real time, while keeping in mind the high impact of incorrect results. For this reason, we’ve built a review pipeline to have human oversight on any unsure predictions. The whole lifecycle of a transaction is depicted in the following flowchart:

Transaction lifecycle in Enrichment product

When a new transaction is received for enrichment, we concatenate all the data fields of the transaction, removing duplicate tokens, stripping all special characters and spaces, as well as lowercasing the resulting string. This string is a distinctive characteristic of a merchant initiating a transaction. We call it a “merchant signature”. Oftentimes, a signature is not new and we can simply query the merchant information associated with it. In this case the enrichment is finished during the transaction processing and enrichment info status is set to SETTLED.

If a full match is unavailable, we try to find the closest match possible. In order to match it properly, we need to narrow down the search. Accordingly, we have trained a machine learning model to predict the scope of a merchant signature. The model was trained on tens of thousands of transactions, and assigns signatures to entity, location or chain scope.

To find the top similar signatures, we send a query to the ElasticSearch cluster looking only at the signatures of the predicted scope. Finally, we validate the search results having separate logic for each scope:

  • For entity transactions we compare the merchant name only because there are so many variations of the address field for the same entity. NETLIX.COM 8008001300 NL, NETFLIX.COM 866–5797172 CA 92035 and 42 (!!) other Netflix variations should be considered a match.
  • For location we have a very tight bond to declare a match, because sometimes the difference between two small businesses is just one letter
  • For chain transactions, we are accepting a match on the entity level more loosely, because address information doesn’t need to be the same. The local information match is more stringent and we accept a match only if the zip code is the same.

If we’ve found the match on all required levels enrichment information status is set as PENDING and the transaction is queued for the review. Otherwise it is added to the scraping queue with the SCRAPER status. All the steps described above are happening in real time during the transaction processing.

On the next step, the scheduled job regularly pulls all the SCRAPER status transactions and sends them to the Selenium docker container in an attempt to gather all the relevant information on the merchant in the search engines. If the web scraper finds something, we create a temporary profile and assign it to a transaction. Additionally, we check that we are not creating a duplicate profile.

Web scraping is an inherently unstable process and can produce unexpected results. For example, once the scraper got in “Aramark $35,000 Jobs, Employment” as a business name. This is why after web scraping, all the transactions are marked as PENDING and are queued for human review.

Human validation is a final stage for all transactions with PENDING enrichment information status. Transactions queued for review are updated and corrected by the Synapse Data Annotation team. The team uses a dedicated UI which gives them tools to create, search and update merchant profiles as well as connect/disconnect merchant profiles to/from the transactions.

Transaction annotation UI

When an annotator has reviewed the transaction in the UI, the transaction and its signature are marked as REVIEWED. This outlines the lifecycle of a transaction in the Enrichment product.

Whats Next

We’ve built all this logic to deliver high quality contextual information for each transaction because we believe there are lots of great features that can be built on top of this foundation. As soon as each transaction is mapped to a business lots of exciting things are possible. Adding company logos, aggregating spendings by merchant and showing the transactions on the map are easy but powerful ones.

We are also starting to put together a subscription management product which will be a combination of our chatbot product line + instant auth + enrichment.

Example Visualization of Subscription Management

On top of that, once merchant classification is a solved problem, it becomes relatively easier to build other financial and merchant insights on top. Here is an example of what we have in mind:

Example Visualization of Transaction Insights

We are also really excited to work on adding the analytics to the enrichment services. We think that each transaction can be interesting in multiple dimensions. From a customer point of view, we can describe a transaction in terms of total and average spendings at a specific merchant, and on the merchant category overall. At the same time, a transaction can be viewed from the merchant point of view and compared to the distribution of all the transactions at this specific merchant. We think that making all this information easily accessible can be of a huge value to our clients and their customers!

We are super excited to learn how this product will be used by our clients and how we can make it better. Stay tuned!