Data Profiling task
- The number of rows in the table.
- The number of distinct values in the State column.
- The number of null or missing values in the Zip column.
- The distribution of values in the City column.
- The strength of the functional dependency of the State column
- Checking data quality before an incremental load. Use the Data Profiling task to compute the Column Null Ratio Profile of new data intended for the Customer Name column in a Customers table. If the percentage of null values is greater than 20%, send an e-mail message that contains the profile output to the operator and end the package. Otherwise, continue the incremental load.
- Automating cleanup when the specified conditions are met. Use the Data Profiling task to compute the Value Inclusion Profile of the State column against a lookup table of states, and of the ZIP Code/Postal Code column against a lookup table of zip codes. If the inclusion strength of the state values is less than 80%, but the inclusion strength of the ZIP Code/Postal Code values is greater than 99%, this indicates two things. First, the state data is bad. Second, the ZIP Code/Postal Code data is good. Launch a Data Flow task that cleans up the state data by performing a lookup of the correct state value from the current Zip Code/Postal Code value.
Please provide your feedback for the post, if you find this post useful. Also Post your query or scenario, i will be happy to help.