Transactions missing during statement import

Hello!

One of my users uploaded a bank statement containing at least 10,773 transactions. FreeAgent took 9 minutes and 3 seconds to create all the corresponding transactions. At 6 minutes in, my app queried the bank transaction endpoint for all transactions updated since 30 minutes previously. 30 minutes later, my app again queried the bank transaction endpoint for all transactions updated since the previous query.

Unfortunately the endpoint never returned 10,773 transactions from that statement.

I’ve been thinking about database transactions and isolation levels but I can’t think of a way this could have happened.

Is there a way to avoid missing transactions in future?

Many thanks,

Andrew

Hi Andrew, and happy new year!

Thanks for flagging this. Just to clarify, were the 10k+ transactions successfully created in your user’s account or did the upload fail? If the transactions were uploaded, were you able to retrieve them with subsequent queries? If so, was the updated_at timestamp set to a time before or after the two queries you sent shortly after the statement upload?

Best wishes,

Ewa

Hi Eva, Happy New Year to you too!

The 10k+ transactions were successfully created. I was able to retrieve them with subsequent queries.

Earliest of the “missing” transactions:

  • uploaded at: 01.01.2021 23:56:46
  • created at: 01.01.2021 23:57:44

Latest of the “missing” transactions:

  • uploaded at: 01.01.2021 23:56:46
  • created at: 02.01.2021 00:05:49

My first query:

  • run at (approx.): 02.01.2021 00:03:00
  • updated_since filter (approx.): 01.01.2021 23:33:00

My second query:

  • run at (approx.): 02.01.2021 00:33:00
  • updated_since filter (approx.): 02.01.2021 00:03:00

Does that help? Let me know if you need anything else :slight_smile:

Best wishes,

Andrew

Thanks a lot for these additional details, I’ll see what I can find out from our logs and will get back to you as soon as I know more.

Thanks Ewa. Let me know if you’d like specific transaction ids.

Hi Andrew,

Thanks for this, I was able to find your customer’s problematic statement upload and requests made by DoubleAgent every half an hour to that account. After doing some digging, we think the reason why none of the bank transactions were detected by your first query at 00:03:00, and only some (~10 000 out of ~14 000) were detected by your query at 00:33:00 is caused by the fact that statement uploads are wrapped in a Rails transaction, which means that records are not fully committed to the database until everything within the transaction is processed (although the created_at and updated_at timestamps are set throughout that time span).

If you have access to a Rails console, you can test this by updating two things inside an ActiveRecord transaction with a 1 minute sleep in between — this results in the updated_at being set to the time the update was made, but the change not being made until the transaction completes.

In your case, because the upload started at 23:56 and, due to the high number of processed bank transactions, didn’t finish until 00:06, the bank transactions were not ready to be returned in response to DoubleAgent’s request at 00:03, and as your second request at 00:33 asked for bank transactions with the updated_at timestamp set within the past half hour, it only picked up on a subset of records from the statement.

To avoid this happening in the future, you could introduce some time overlap in your polling requests or alternatively use a call to /v2/bank_transactions?bank_account=:bank_account&last_uploaded=true when trying to detect the latest upload. I realise this is not ideal, so I’ll flag this to the team responsible for the banking functionality in the app; independently, we’re also beginning to do some initial (albeit long overdue) research into the introduction of webhooks, which no doubt would help you avoid such pitfalls as well.

Best wishes,

Ewa

Hi Ewa,

Thanks as always for investigating! That all makes perfect sense. I thought everything would be in a transaction but for some reason assumed the updated_at would be set to the time the transaction committed rather than each record’s actual creation time – which doesn’t make any sense when it’s spelled out :wink:

I can see there’s no tidy workaround. Using last_uploaded=true will miss transactions if there is more than one upload between queries. Introducing a time overlap could work but it depends on guessing a large enough overlap and introduces work to discard transactions already known. Web hooks would be great from my end but I can see they are non-trivial to implement for the server.

Anyway, now we know the problem I can look out for it in future and try to mitigate it.

Thanks again!

Best wishes,
Andrew

1 Like

Hi Andrew,

You’re most welcome! I can fully appreciate the predicament this causes – if it’s any consolation, imports that take longer than a minute comprise less than 0.1% of all bank transaction imports (although in the last 30 days we did have ~60 imports that took over 10mins).

One more solution I can think of is checking the uploaded_at timestamp of transactions which are detected by your polling request, and, if that timestamp is earlier than your updated_since view filter, sending another request to detect anything created since the uploaded_at time – which I appreciate would nonetheless complicate your polling logic. In any case, I hope you manage to find a suitable solution for DoubleAgent and I’m glad that at least we were able to shed some light on this – admittedly rather confusing – behaviour!

Best wishes,

Ewa

One more solution I can think of is…

That’s a good idea! I’ll do that. Thanks again!

Best wishes,

Andrew

1 Like