Is AutoNumber() really worth the hassle?

AutoNumber() can be a pain when trying to debug problems in a QV data model, especially if you are integrating multiple data sources and need the original source’s keys to trace problems. Sometimes I get the impression that people recommend it based on the idea it will make the links between tables more efficient because numbers are more efficient key fields than strings, but given that all data whether a string or a number value is assigned a binary record pointer automatically, QlikView makes links between tables using those same binary addresses.

Autonumber() does nothing more than slim down the RAM QlikView uses by reducing the size of the symbol table that contains the unique values of a field.  Therefore, it makes no sense to transform an ID with autonumber() for a key field and also keep the ID in its original state in another column.  The only exception to this rule is when the original ID is used as a dimension or in an expression.

I started doubting the function’s benefits during performance tests of a 5,000,000,000-row / 30-field data model when I saw no change in the data model’s response time with or without using autonumber().  Recently, I optimized a QlikView application that had a file size of 1 GB and reduced it to 400 MB.  As part of the process to renovate the data model, I removed the autonumber() functions from the key fields.  I even had the pleasure to witness a nested autonumber() in a key field that was made out of two other key fields also generated by autonumber().

In the end and since the creation of the data model involved a unstable data integration and facts could contain keys that have no pair, I didn’t apply the autonumber() and left the key field with its original values so that it would be easier to trace the missing data at the source.  As was the case for this data model, removing unused fields discovered using Rob Wunderlich’s Document Analyzer and link tables (a structure I haven’t used for years) are more effective techniques to optimize a data model than using the autonumber() for key fields.

With that said, if you have a key field that contains a large number of unique values that happen to be long strings, autonumber() will help reduce the RAM used by QlikView, but otherwise I don’t see any reason to over program a QlikView script and litter it with autonumber() functions.

Karl

Author: Karl Pover

Owner of Evolution Consulting, which provides Qlik consulting services throughout Mexico. Author of "Learning QlikView Data Visualization" and "Mastering QlikView Data Visualization." Qlik Luminary since 2014.

8 thoughts on “Is AutoNumber() really worth the hassle?”

  1. Using autonumber() on key fields has another unmentioned benifit. I don’t want my end users to be using keys as value-dimensions front end. By changing the key values to numeric ones I force the user to use the dimension or transaction value, the key value will have minimal front end value besides linking tables together.

    In addition to using autonumber() I make sure the field is $hidden to prevent front end use.

  2. Hi Karl,

    So I have recommended autonumber for a while without thinking through your analysis above. Thanks for bringing clarity to this subject.

    Firstly, I imagine the same logic would apply to the different iterations of the hash function. Agree?

    Secondly, although I think you are correct about a net “no performance benefit” effect in the UI, don’t you believe the reduced total RAM footprint is worth its use? This might especially be of concern where the RAM is highly leveraged in the server environment, ie the adage “every little bit counts”.

    Keep up the great work, sir.

  3. Hi Aaron,

    Thanks for your comment. I agree that using the autonumber may be beneficial in reducing the RAM footprint of the QlikView application. I recently recommended it be applied to a extensive composite key that was taking up tons of RAM according to Rob Wunderlich’s document analyzer.

    The same logic applies to the hash functions when it replaces something like autonumber(key1 & ‘_’ & key2) with hash128(key1,key2) because it hides important information that may slow down your ability to develop, test and maintain a QlikView data model.

    I don’t think it is wise to take “best” practices for granted and apply them without thinking. Every once in awhile it’s good to step back and think about all the effects their use has. I wonder what other “best” practices we take for granted?

    Karl

  4. Hi Karl,
    I didn’t quite understand this comment:
    **Therefore, it makes no sense to transform an ID with autonumber() for a key field and also keep the ID in its original state in another column**
    I’m not sure why you would do that? If I’m autonumbering, then I’m sure to make sure I drop the original ID fields once the autonumber() process has completed. One of the benefits of the autonumber process is that so long as you have a sequential numeric key, then the column takes no space in the symbol tables. This can be a big saving both in terms of memory but also disk space and hence reload times of your QVD.
    It’s a useful discussion though. Autonumbering is not always the best treatment, but when it is, it’s very useful.
    Sorry for chiming in on this so late.
    Kind regards
    George Beaton

    1. Hi George,
      Thanks for you comment. It’s actually great to read past posts I’ve written and decipher what I wanted say. Obviously, some of my opinions could even change over time.

      First, everything you say about the auto-number not using a symbol table is correct and I agree that can reduce memory usage. How great that reduction would be depends on the original IDs. I would be selective as to saving an auto-number in a QVD since I would have to make that sure all the tables that share the same auto-numbered key fields are created in the same script which might not always be the case.

      As far as dropping the original ID fields, I would only do that in some last step during the development stage after I’ve validated the data and know the data model is not going to change. If we store the original ID, we can find the cause of incorrect data so much easier since we can cross reference the data in QlikView with the source system. I’ve also found that data validation is a continuous effect not restricted to some final development stage. Changes to the source system, the QlikView data model, or a grumpy business require that we constantly validate data, and I think the auto-number slows down that process.

      Again, if auto-number noticeably reduces memory and response time then I’ll use it. The post was originally meant to make people think about the good and bad of auto-number and not apply it mindlessly. Again, thanks for the comment so that we can keep the discussion alive.

      Best, Karl

  5. Hi Karl

    I am currently working on a Qlik Sense project and the debate about using AutoNumber() has surfaced. The source of data for all our Qlik Sense applications are conformed data marts. These data marts already have unique keys for all facts and dimensions, with a data type and length of NVarchar(40).

    Our deployment is using Microsoft Azure VMs, so we can scale the solution through adding more VMs.

    In our situation I just don’t see the point of using AutoNumber() for our already unique keys? Thoughts?

    Best, Paul

    1. Qlik Sense will determine a column’s data length at the time it is loaded regardless of what it may be in the source database. If you have 1,000,000 unique values that are really 80 bytes long then you save 76 MB using autonumber(). I would definitely use it in this case.

      However, if your unique key contains 1,000,000 unique integers that are no bigger than 4 bytes long then you’ll only save 3.8 MB using autonumber(). This won’t have any impact on the application’s performance unless you have a fact table with an extreme number of unique keys (eg. 20) with similar characteristics.

      In conclusion, if your data model doesn’t have any of these extreme characteristics, the use of autonumber() won’t have any noticeable impact on a Qlik Sense application’s performance. In which case, I usually don’t use it so that I can more easily debug data issues by quickly matching a unique key in QlikView with the same key in the source database.

      I hope that helps.

      Karl

Leave a Reply