LAMS Development
  1. LAMS Development
  2. LDEV-4438

Duplicate AssessmentResults with the 'latest' flag ON

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.0
    • Component/s: Tool Assessment
    • Labels:
      None

      Description

      As reported by Marcin,
      "The thing is that we can't completely prevent duplicate calls. There is too much going on.
      We've got auto save, network errors that can make browser repeat a request, cheating students and simple fast-clickers.
      We need to be ready for duplicate calls and we are not.
      Yes, before creating a new result we check if one already does not exist and we do it in a transaction,
      but a race condition can still cause trouble.
      We don't have transaction isolation strong enough and we won't have it. It would have too much impact on our DB.
      What we can do is either prevent duplicate results from being created or accept them.
      Here are few solutions:
      1. Add another unique index
      We already have got one for finish_date, now we need one for start_date.
      It's a good way for ensuring that a immediate duplicate request won't create duplicate results.
      We can use this solution along with other ones.
      Disadvantages of this solution is that (as you noticed) not all duplicates have the same start and/or finish date
      and that the extra index can have impact on speed of data manipulation in the results table.
      The impact can be easily measured if we can clone the production DB and play with selecting and inserting stuff
      into the indexed and unindexed results table.
      I suspect there will be little difference and who knows, the index may even speed up selecting

      2. Synchronise result creating code
      This way just one result can be created on one node at the same time.
      Since the duplicates reference the same user, duplicate requests go to the same node
      and we don't have to worry that synchronisation is not distributed.
      We can tune it so that the lock is made for the given user, not the whole node,
      i.e. only one result for the given user can be created at the same time
      and not only one result for any user can be created at the same time.
      The implementation will be a bit odd, but it's doable.
      I personally don't like the idea of synchronisation.
      It sounds bottleneck-ish and we should avoid it in our clustered, multithreaded environment.
      But it's an idea and we have used and still use some synchronisation in LAMS.

      3. Do an extra check just before saving a result
      What we do now is check if a latest result already exists
      and then we do plenty of stuff and then we save a new result.
      If we hit the DB again just before the save, maybe we could detect the problem and stop. Maybe.

      4. Make start and finish dates more accurate.
      We store them up to a second. A lot of things can happen within a second.
      If we had milliseconds, maybe there would be no (less?) duplicates.

      5. Get rid of "latest" column
      Why the hell we've got "latest" column?
      With race conditions this is bound to fail.
      We can sort results descending by start date and get the first result.
      And what we get? The latest result. Wouldn't this do?
      This solution accepts duplicates but ignores them.

      6. Pick one of duplicates
      This is a variation of #4.
      We leave the latest column, but instead of expecting an unique latest result,
      we get the first one. For 99% of cases there will be just one result.
      In the remaining, duplicate cases we pick just one and don't care about the other.
      It's OK as long as we always pick the same result of the duplicate pair."

      I reviewed all of them and tried to implement #5 and #6 but it led to a severe drop down of the efficiency of SQL queries run from AssessmentUserDAOHibernate.java.

      So we can try the next idea. Add unique constraint (assessment_uid, user_uid, latest) to tl_laasse10_assessment_result which will apply only in case latest=1. It can be done by making 'latest' field to contain only NULL or 1 (as opposite to the current 0 or 1). This way constraint will apply only to when 'latest' is 1.
      It can also be done using triggers (as suggested in https://stackoverflow.com/a/18460214) but we'll reserve it for later if needed.

        Activity

        Andrey Balan created issue -
        Hide
        Andrey Balan added a comment -
        Done.
        Show
        Andrey Balan added a comment - Done.
        Andrey Balan made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Ernie Ghiglione added a comment -
        Adding Andrey's comments for documentation:

        Marcin, good job, man! It was so helpful to have all those solutions listed, and have them in such an intuitively clear manner. I really liked working with them, initially I thought that #1 + #5 or #6 will work out but it didn't, so I had to search for another solution.

        > 1. Add another unique index

        I suspect there will be little difference and who knows, the index may even speed up selecting
        From the beginning I thought we'll ultimately end up implementing this option. But I guess with the solution I'll describe later below there is no need in doing it.

        > 2. Synchronise result creating code

        I just trusted your expertise that this is not the best option. However, at the times of despair (when I ran out of available solutions) I thought this is the only way to go.


        > 3. Do an extra check just before saving a result

        Yes, it will eliminate some duplicates, but definitely not all of them.


        > 4. Make start and finish dates more accurate.

        If you save dates as milliseconds then you'll get fewer duplicates that take into account finishDates, i.e. less duplicate (assessment_uid, user_uid, start_date) pairs. But because of this increase of accuracy you'll get more duplicates of the pairs (assessment_uid, user_uid, latest=1), as start_date constraint will no longer apply.

        > 5. Get rid of "latest" column

        As I mentioned in the previous email, we can't remove 'latest' column (although it's a very tempting option indeed), as it will decrease the efficiency of complex SQL queries that without 'latest' column should start using sub-queries.


        > 6. Pick one of duplicates
        > This is a variation of #4.


        I expected then that this would be the perfect pick.
        And it works flawlessly indeed for the queries from AssessmentQuestionResultDAOHibernate.java where you simply need to choose one of the duplicates.
        But it works with varied success for the queries from AssessmentUserDAOHibernate.java.
        Especially for such simple queries like
            "SELECT grade FROM tl_laasse10_assessment_result WHERE finish_date IS NOT NULL AND latest = 1 AND session_id = :sessionId";
        where we should also add 'GROUP BY user_uid' to avoid duplicates. To process this query database server needs about 30 seconds, which is a no go.

        My solution was to add unique constraint (assessment_uid, user_uid, latest), which should only apply when latest=1. This is achievable by forcing 'latest' column to store only 'NULL' and '1' values (as opposite to the usual '0' and '1'). (In case of NULL this constraint does not apply).
        Overall, I think this constraint should solve the problem with AssessmentResult duplicates.
        Show
        Ernie Ghiglione added a comment - Adding Andrey's comments for documentation: Marcin, good job, man! It was so helpful to have all those solutions listed, and have them in such an intuitively clear manner. I really liked working with them, initially I thought that #1 + #5 or #6 will work out but it didn't, so I had to search for another solution. > 1. Add another unique index I suspect there will be little difference and who knows, the index may even speed up selecting From the beginning I thought we'll ultimately end up implementing this option. But I guess with the solution I'll describe later below there is no need in doing it. > 2. Synchronise result creating code I just trusted your expertise that this is not the best option. However, at the times of despair (when I ran out of available solutions) I thought this is the only way to go. > 3. Do an extra check just before saving a result Yes, it will eliminate some duplicates, but definitely not all of them. > 4. Make start and finish dates more accurate. If you save dates as milliseconds then you'll get fewer duplicates that take into account finishDates, i.e. less duplicate (assessment_uid, user_uid, start_date) pairs. But because of this increase of accuracy you'll get more duplicates of the pairs (assessment_uid, user_uid, latest=1), as start_date constraint will no longer apply. > 5. Get rid of "latest" column As I mentioned in the previous email, we can't remove 'latest' column (although it's a very tempting option indeed), as it will decrease the efficiency of complex SQL queries that without 'latest' column should start using sub-queries. > 6. Pick one of duplicates > This is a variation of #4. I expected then that this would be the perfect pick. And it works flawlessly indeed for the queries from AssessmentQuestionResultDAOHibernate.java where you simply need to choose one of the duplicates. But it works with varied success for the queries from AssessmentUserDAOHibernate.java. Especially for such simple queries like     "SELECT grade FROM tl_laasse10_assessment_result WHERE finish_date IS NOT NULL AND latest = 1 AND session_id = :sessionId"; where we should also add 'GROUP BY user_uid' to avoid duplicates. To process this query database server needs about 30 seconds, which is a no go. My solution was to add unique constraint (assessment_uid, user_uid, latest), which should only apply when latest=1. This is achievable by forcing 'latest' column to store only 'NULL' and '1' values (as opposite to the usual '0' and '1'). (In case of NULL this constraint does not apply). Overall, I think this constraint should solve the problem with AssessmentResult duplicates.
        Hide
        Ernie Ghiglione added a comment -
        Thanks Andrey.
        Show
        Ernie Ghiglione added a comment - Thanks Andrey.
        Ernie Ghiglione made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Andrey Balan
            Reporter:
            Andrey Balan
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development