Working with CMS Data

Find Duplicate CMS Items

Overview
Useful Tools & Techniques
Google Sheets
5:45
201
Find Duplicate CMS Items
202
No items found.
Published
April 12, 2023
Updated
in lightbox

How do you find CMS items that have a duplicate value such as a URL, in the same field?

For this discussion we'll assume you have > 1,000 items.

The Problem

Webflow's ability to query the API for specific content is pretty much non-existent, so a task like finding duplicates values in a field is an adventure in creative problem solving.

Here are several approaches you can take depending on your needs.

For one-time, occasional checks

OPTION 1 - Use a Spreadsheet or Database tool

Download the CSV, load it into a spreadsheet, and use its own tools to find duplicates. This still has some hurdles as most spreadsheet solutions do this using conditional formatting which means you still have to read the whole sheet to find those duplicate/highlighted rows.

Loading into a tool like Access or Airtables gives you better tools for finding the duplicates but the load process is generally a bit more work as you need to specify field types.

OPTION 2 - Use Python or Awk

Download the CSV and use a command line tool like awk, or use a Python dataframe.

Requires light programming knowledge and the necessary tools.

Python, untested;


import pandas as pd

df = pd.read_csv('myfile.csv')
duplicates = df[df.duplicated()]

Awk, untested;

awk -F "," '{if (a[$1]++ == 1) print}' myfile.csv

For regular or automated checks

OPTION 3 - Use a Webflow page + tools & javascript

Build a special page in your Webflow site. Have it load all of your content, just the slug field and the field(s) you need to monitor for duplicates. Load all 1,000+ items. Probably sort on the duplicate field and then have a script run to iterate from the end of the list to delete any non-duplicates, so all you have is a list of duplicates remaining. Visit the page any time you need to check. There are some challenges here, getting all of the data in requires a tool like Finsweet's CMS Load More, and waiting for the data to load before you so your sort & delete.

OPTION 4 - Sync the CMS to AirTable and automate the check

If you need a realtime / automated solution for monitoring duplicates, you can use Whalesync or Powerimporter Pro to sync your CMS tables with Airtable, and have an automated process check there and alert you with any dupes.

Videos
No items found.
Table of Contents
Comments
Did we just make your life better?
Passion drives our long hours and late nights supporting the Webflow community. Click the button to show your love.