We need to de-dupe our customer list for our integration with Hubspot CRM, which uses unique emails as a key. I’m struggling to figure out how to make a BAQ (and ultimately as dashboard) to show all customers that share an email address with another customer.
Here’s what ChatGPT suggests:
SELECT email, COUNT(*) AS count
FROM table_name
GROUP BY email
HAVING COUNT(*) > 1;
How would I do this in BAQ?
I created a query with a calculated field ‘CountEmail’, and added a subquery criteria of Count>1, which works to show all emails that are used more than once, along with their count.
Then I added CustID, which gave an aggregate error, so I checked group-by. Then 0 records are returned.
If you set your initial query as a CTE then create another query to pull Customer Name/Num and Email from the CTE I think that would work best (easiest?).
Edit: Make the second query a top-level query, and do a join on your CTE query to Customer in the new top-level query on the email address field.
Ken - I forgot about being able to use the “Calculated” table in a sub-query criteria so thank you - but I am not sure that gets the answer for Andris in this case because of the additional fields he wants to display. I “think” it breaks down on the “Group By” clause - since the additional fields (Customer ID/Name/etc) need to have Group By checked, nothing (or very few) would end up with a “Count” greater than 1. I feel like there is a way to overcome this with window functions like ‘over’ but we are quickly exceeding my knowledge. Is there another way around that?
You can make sure it works and then add to the subquery criteria to show where count is greater than 1. Which removes the use of the HAVING all together.
Who needs ChatGPT? lol epiUsers.help is so much better. am i right??
Thanks, Ken! I tried your final one, and it ran. Then I tried filtering out Count<=1 by adding a SubQuery Criteria of Count > 1, but that errors out also.
I checked ‘Having’ and that errors out the the aggregate warning again.
Edit: Sorry had not seen the response from Andris before I posted this
Nice! But apparently adding the subquery criteria throws an error of “Windowed functions can only appear in the SELECT or ORDER BY clauses.” So while Andris could probably set a filter on the eventual dashboard… instead I went to ChatGPT (why not) and it suggested a CTE or subquery I asked about doing it without one of those and it suggested a ‘derived table’ which did work fine in a SQL query but not sure how to do that in a BAQ. Here is the Select statement that worked:
SELECT
dt.Calculated_Count,
dt.Customer_CustID,
dt.Customer_CustNum,
dt.Customer_EMailAddress
FROM (
SELECT
COUNT(*) OVER (PARTITION BY Customer.EMailAddress) AS [Calculated_Count],
Customer.CustID AS [Customer_CustID],
Customer.CustNum AS [Customer_CustNum],
Customer.EMailAddress AS [Customer_EMailAddress]
FROM
Erp.Customer AS Customer
) AS dt
WHERE
dt.Calculated_Count > 1;
Awesome sauce Ken and Dan! Thank you very much for the advice and examples. I added an additional subquery criteria to filter out the 984k customers with an empty string email address.