How to Remove Duplicate Elements from a List in Python While Preserving Order

When working with lists in Python, you may encounter situations where you need to remove duplicate elements while keeping the original order of the list intact. This is a common requirement when dealing with data processing and deduplication tasks.

In this article, we will explore simple and effective ways to remove duplicate elements from a list in Python while preserving order.

Using dict.fromkeys()

Python dictionaries maintain the insertion order (since Python 3.7, officially guaranteed in Python 3.8+). We can take advantage of this behavior using dict.fromkeys():

def remove_duplicates(lst):
    return list(dict.fromkeys(lst))

# Example usage
my_list = [1, 2, 3, 2, 1, 4, 5, 3]
unique_list = remove_duplicates(my_list)
print(unique_list)  # Output: [1, 2, 3, 4, 5]

Code explained:

dict.fromkeys(lst) creates a dictionary where the keys are the elements from lst. Since dictionaries do not allow duplicate keys, only the first occurrence of each element is kept.
Converting the dictionary back to a list restores the original order of appearance.

Using a set and a Loop

This is a more explicit approach. Another way to remove duplicate elements from a list in Python while keeping the order is to use a set and manually iterate through the list:

def remove_duplicates(lst):
    seen = set()
    unique_list = []
    for item in lst:
        if item not in seen:
            unique_list.append(item)
            seen.add(item)
    return unique_list

# Example usage
my_list = ["apple", "banana", "apple", "orange", "banana", "grape"]
unique_list = remove_duplicates(my_list)
print(unique_list)  # Output: ['apple', 'banana', 'orange', 'grape']

Code explained:

A set (seen) stores already encountered elements.
The for loop iterates through the list and adds only unseen elements to unique_list.

Using pandas (For Large Datasets)

If you’re working with large datasets, pandas provides a quick way to remove duplicate elements from a list in Python:

import pandas as pd

def remove_duplicates(lst):
    return list(pd.Series(lst).drop_duplicates())

# Example usage
my_list = [10, 20, 10, 30, 40, 20, 50]
unique_list = remove_duplicates(my_list)
print(unique_list)  # Output: [10, 20, 30, 40, 50]

Code explained:

pd.Series(lst) converts the list into a pandas Series.
.drop_duplicates() removes duplicates while preserving order.

We have discussed the following ways on how to remove duplicate elements from a list in Python while preserving order, you can use:

dict.fromkeys() – The simplest and most Pythonic way, and my personal pick.
A set and a loop – More explicit and easy to understand.
pandas – Best for handling large datasets efficiently.