File Upload | What is a Key?
When setting up a ChannelMix file upload, you are asked to define a "Key". But what is a key and how does it work? Read on to learn more!
What is a key?
A key is a unique identifier for a row of data. In the case of your marketing data, it is often a combination of several column names.
Why are keys important?
In database design, we use append and replace logic when dealing with updating data. What does that mean?
- Append: Adding new data to the end. For example, when you upload a new version of a file, we tack it onto the end of the existing data.
- Replace: Swapping out old data with new data. For example, replacing existing rows in your file with updated ones.
In summary, append and replace help us manage duplicate data during updates. Without this logic, duplicate data can inflate our metrics, causing data discrepancies.
A key tells the system what rows to create this append and replace logic around.
How does it work?
Let's say you have the following file with Twitter data.
Tweet ID | Tweet | Created Date | Report Date | Number of Comments |
---|---|---|---|---|
64626 | "Text of Tweet 1" | 2023/11/01 | 2023/11/01 | 3 |
64626 | "Text of Tweet 1" | 2023/11/01 | 2023/11/02 | 5 |
64626 | "Text of Tweet 1" | 2023/11/01 | 2023/11/05 | 7 |
In this example, we have one tweet that's been reported on three different days. On November 1st, the tweet had 3 comments, on November 2nd, 5, and on November 5th, 7.
How many unique identifiers do we need to distinguish each of these rows?
The correct answer is two! We need Tweet ID and Report Date.
Let's look at why.
Tweet ID is a unique identifier for a Tweet. For this reason, we don't need the text of the Tweet or the Created Date. It is understood that those two will be the same for each unique Tweet ID.
If we use Tweet ID alone, it will replace data that is not the same. For example, last week you uploaded the following:
Tweet ID | Tweet | Created Date | Report Date | Number of Comments |
---|---|---|---|---|
64626 | "Text of Tweet 1" | 2023/11/01 | 2023/11/01 | 3 |
This week you want to upload this additional data:
Tweet ID | Tweet | Created Date | Report Date | Number of Comments |
---|---|---|---|---|
64626 | "Text of Tweet 1" | 2023/11/01 | 2023/11/02 |
5 |
If your only key is Tweet ID, this new row will replace the original row. But they aren't the same, so you would be losing your data from 2023/11/01.
Now let's think about what happens if we add Tweet ID and Report Date both as keys. Now we see these two rows of data as unique, so they both would be included.
For example, you realize that Report Date 2023/11/05 should have 10 comments instead of 7. So your original file is as follows:
Tweet ID | Tweet | Created Date | Report Date | Number of Comments |
---|---|---|---|---|
64626 | "Text of Tweet 1" | 2023/11/01 | 2023/11/01 | 3 |
64626 | "Text of Tweet 1" | 2023/11/01 | 2023/11/02 | 5 |
64626 | "Text of Tweet 1" | 2023/11/01 | 2023/11/05 | 7 |
Now you upload the corrected file:
Tweet ID | Tweet | Created Date | Report Date | Number of Comments |
---|---|---|---|---|
64626 | "Text of Tweet 1" | 2023/11/01 | 2023/11/05 | 10 |
With the keys of Tweet ID and Report Date, the system will look for all rows in the original file with Tweet ID 64626 and Report Date 2023/11/05. It will then replace them with this new row. Your final results are as follows:
Tweet ID | Tweet | Created Date | Report Date | Number of Comments |
---|---|---|---|---|
64626 | "Text of Tweet 1" | 2023/11/01 | 2023/11/01 | 3 |
64626 | "Text of Tweet 1" | 2023/11/01 | 2023/11/02 | 5 |
64626 | "Text of Tweet 1" | 2023/11/01 | 2023/11/05 | 10 |