When working with lists in Python, you may encounter situations where you need to remove duplicate elements while keeping the original order of the list intact. This is a common requirement when dealing with data processing and deduplication tasks.
In this article, we will explore simple and effective ways to remove duplicate elements from a list in Python while preserving order.
Using dict.fromkeys()
Python dictionaries maintain the insertion order (since Python 3.7, officially guaranteed in Python 3.8+). We can take advantage of this behavior using dict.fromkeys()
:
def remove_duplicates(lst):
return list(dict.fromkeys(lst))
# Example usage
my_list = [1, 2, 3, 2, 1, 4, 5, 3]
unique_list = remove_duplicates(my_list)
print(unique_list) # Output: [1, 2, 3, 4, 5]
Code explained:
dict.fromkeys(lst)
creates a dictionary where the keys are the elements fromlst
. Since dictionaries do not allow duplicate keys, only the first occurrence of each element is kept.- Converting the dictionary back to a list restores the original order of appearance.
Using a set and a Loop
This is a more explicit approach. Another way to remove duplicate elements from a list in Python while keeping the order is to use a set
and manually iterate through the list:
def remove_duplicates(lst):
seen = set()
unique_list = []
for item in lst:
if item not in seen:
unique_list.append(item)
seen.add(item)
return unique_list
# Example usage
my_list = ["apple", "banana", "apple", "orange", "banana", "grape"]
unique_list = remove_duplicates(my_list)
print(unique_list) # Output: ['apple', 'banana', 'orange', 'grape']
Code explained:
- A
set
(seen
) stores already encountered elements. - The
for
loop iterates through the list and adds only unseen elements tounique_list
.
Using pandas (For Large Datasets)
If you’re working with large datasets, pandas provides a quick way to remove duplicate elements from a list in Python:
import pandas as pd
def remove_duplicates(lst):
return list(pd.Series(lst).drop_duplicates())
# Example usage
my_list = [10, 20, 10, 30, 40, 20, 50]
unique_list = remove_duplicates(my_list)
print(unique_list) # Output: [10, 20, 30, 40, 50]
Code explained:
pd.Series(lst)
converts the list into a pandas Series..drop_duplicates()
removes duplicates while preserving order.
We have discussed the following ways on how to remove duplicate elements from a list in Python while preserving order, you can use:
dict.fromkeys()
– The simplest and most Pythonic way, and my personal pick.- A
set
and a loop – More explicit and easy to understand. pandas
– Best for handling large datasets efficiently.