String processing is a very common skill, but Python built-in string method is too many, often forgotten, in order to facilitate quick reference, according to Python 3.5.1 to write an example of each built-in method and categorize, for everyone to index.
PS: You can click the green header in the overview to enter the corresponding category or quickly index the corresponding method through the right sidebar article catalog.
overview
String case conversion
-
str.capitalize()
-
str.lower()
-
str.casefold()
-
str.swapcase()
-
str.title()
-
str.upper()
String format output
-
str.center(width[, fillchar])
-
str.ljust(width[, fillchar]); str.rjust(width[, fillchar])
-
str.zfill(width)
-
str.expandtabs(tabsize=8)
-
str.format(^args, ^^kwargs)
-
str.format_map(mapping)
String search location and substitution
-
str.count(sub[, start[, end]])
-
str.find(sub[, start[, end]]); str.rfind(sub[, start[, end]])
-
str.index(sub[, start[, end]]); str.rindex(sub[, start[, end]])
-
str.replace(old, new[, count])
-
str.lstrip([chars]); str.rstrip([chars]); str.strip([chars])
-
static str.maketrans(x[, y[, z]]); str.translate(table)
Union and segmentation of strings
-
str.join(iterable)
-
str.partition(sep); str.rpartition(sep)
-
str.split(sep=None, maxsplit=-1); str.rsplit(sep=None, maxsplit=-1)
-
str.splitlines([keepends])
String conditional judgement
-
str.endswith(suffix[, start[, end]]); str.startswith(prefix[, start[, end]])
-
str.isalnum()
-
str.isalpha()
-
str.isdecimal(); str.isdigit(); str.isnumeric()
-
str.isidentifier()
-
str.islower()
-
str.isprintable()
-
str.isspace()
-
str.istitle()
-
str.isupper()
String encoding
-
str.encode(encoding=”utf-8″, errors=”strict”)
toggle case
str.capitalize()
To capitalize an acronym, it is important to note that if the acronym is not capitalized, the original string is returned.
'adi dog'.capitalize()
# 'Adi dog'
'abcd Xu'.capitalize ()# 'Abcd Xu '''Xu abcd'.capitalize ()# 'Xu abcd''ß'.capitalize()
# 'SS'
str.lower()
Converts strings to lowercase, only pairs.ASCII
The encoded letters are valid.
'DOBI'.lower()
# 'dobi'
'ß'.lower() # 'ß' For German lowercase letters, they have another kind of lowercase.'ss', lower Method cannot be converted# 'ß'
'Xu ABCD'.lower ()# 'Xu abcd'
str.casefold()
The string is converted to lowercase, and all the corresponding lowercase forms in the Unicode encoding will be converted.
'DOBI'.casefold()
# 'dobi'
'ß'.casefold() #The German medium and small letter mother is equivalent to the lowercase letter SS, which is capitalized as SS.# 'ss'
str.swapcase()
Inverts the capitalization of string letters.
'Xu Dobi A123'.swapcase ()#: 'Xu dOBI A123 SS'here is converted into SS, which is a capital letter.
But what we need to pay attention to iss.swapcase().swapcase() == s
Not necessarily true:
u'\xb5'
# 'µ'
u'\xb5'.swapcase()
# 'Μ'
u'\xb5'.swapcase().swapcase()
# 'μ'
hex(ord(u'\xb5'.swapcase().swapcase()))
Out[154]: '0x3bc'
Here'Μ'
(The lowercase of Mu is not M.'μ'
The way of writing is consistent.
str.title()
Capitalize the initial word of each word in the string. It is based on blanks and punctuation, so it is wrong to skim possessive cases or abbreviations in some English capitals.
'Hello world'.title()
# 'Hello World'
'Chinese ABC def 12gh'.title ()# 'Chinese Abc Def 12Gh'# But this method is not perfect:"they're bill's friends from the UK".title()
# "They'Re Bill'S Friends From The Uk"
str.upper()
Changing all the letters of the string to uppercase automatically ignores characters that cannot be converted to capitals.
'Chinese ABCdef 12gh'.upper()
# 'Chinese ABC DEF12GH'
What we need to pay attention to iss.upper().isupper()
Not necessarily forTrue
。
String format output
str.center(width[, fillchar])
Displays the string in the center of a given width, filling in the extra length for a given character, and returns the original string if the specified length is less than the string length.
'12345'.center(10, '*')
# '**12345***'
'12345'.center(10)
# ' 12345 '
str.ljust(width[, fillchar]); str.rjust(width[, fillchar])
Returns a string of the specified length, left (right) of the string content, if the length is less than the length of the string, then returns the original string, defaults to fill the ASCII space, you can specify the filled string.
'dobi'.ljust(10)
# 'dobi '
'dobi'.ljust(10, '~')
# 'dobi~~~~~~'
'dobi'.ljust(3, '~')
# 'dobi'
'dobi'.ljust(3)
# 'dobi'
str.zfill(width)
Fill the string with’0’and return the string of the specified width.
"42".zfill(5)
# '00042'
"-42".zfill(5)
# '-0042'
'dd'.zfill(5)
# '000dd'
'--'.zfill(5)
# '-000-'
' '.zfill(5)
# '0000 '
''.zfill(5)
# '00000'
'dddddddd'.zfill(5)
# 'dddddddd'
str.expandtabs(tabsize=8)
Replaces the horizontal tab with the specified space so that the spacing between adjacent strings is kept within the specified number of spaces.
tab = '1\t23\t456\t7890\t1112131415\t161718192021'
tab.expandtabs()
# '1 23 456 7890 1112131415 161718192021'
# '123456781234567812345678123456781234567812345678' Note the relationship between the number of blanks and the output position above.Tab.expandtabs (4)
# '1 23 456 7890 1112131415 161718192021'
# '12341234123412341234123412341234'
str.format(^args, ^^kwargs)
The syntax for formatting strings is quite diverse. Official documents already have more detailed examples. No examples are written here. Children’s shoes you want to know can be directly stamped here in Format examples.
str.format_map(mapping)
Similarstr.format(*args, **kwargs)
,The difference ismapping
It’s a dictionary object.
People = {'name':'john', 'age':56}
'My name is {name},i am {age} old'.format_map(People)
# 'My name is john,i am 56 old'
String search location and substitution
str.count(sub[, start[, end]])
text = 'outer protective covering'
text.count('e')
# 4
text.count('e', 5, 11)
# 1
text.count('e', 5, 10)
# 0
str.find(sub[, start[, end]]); str.rfind(sub[, start[, end]])
text = 'outer protective covering'
text.find('er')
# 3
text.find('to')
# -1
text.find('er', 3)
Out[121]: 3
text.find('er', 4)
Out[122]: 20
text.find('er', 4, 21)
Out[123]: -1
text.find('er', 4, 22)
Out[124]: 20
text.rfind('er')
Out[125]: 20
text.rfind('er', 20)
Out[126]: 20
text.rfind('er', 20, 21)
Out[129]: -1
str.index(sub[, start[, end]]); str.rindex(sub[, start[, end]])
Andfind()
rfind()
Similarly, the difference is that if it can not be found, it will trigger.ValueError
。
str.replace(old, new[, count])
'dog wow wow jiao'.replace('wow', 'wang')
# 'dog wang wang jiao'
'dog wow wow jiao'.replace('wow', 'wang', 1)
# 'dog wang wow jiao'
'dog wow wow jiao'.replace('wow', 'wang', 0)
# 'dog wow wow jiao'
'dog wow wow jiao'.replace('wow', 'wang', 2)
# 'dog wang wang jiao'
'dog wow wow jiao'.replace('wow', 'wang', 3)
# 'dog wang wang jiao'
str.lstrip([chars]); str.rstrip([chars]); str.strip([chars])
' dobi'.lstrip()
# 'dobi'
'db.kun.ac.cn'.lstrip('dbk')
# '.kun.ac.cn'
' dobi '.rstrip()
# ' dobi'
'db.kun.ac.cn'.rstrip('acn')
# 'db.kun.ac.'
' dobi '.strip()
# 'dobi'
'db.kun.ac.cn'.strip('db.c')
# 'kun.ac.cn'
'db.kun.ac.cn'.strip('cbd.un')
# 'kun.a'
static str.maketrans(x[, y[, z]]); str.translate(table)
maktrans
It is a static method for generating a comparison table.translate
Use.
Ifmaktrans
If there is only one parameter, it must be a dictionary. The key of the dictionary is either a Unicode encoding (an integer) or a string of length 1. The dictionary value can be any string.None
Or Unicode encoding.
a = 'dobi'
ord('o')
# 111
ord('a')
# 97
hex(ord('Dog ')# '0x72d7'
b = {'d':'dobi', 111:' is ', 'b':97, 'i':'\u72d7\u72d7'}
table = str.maketrans(b)
a.translate(table)
# 'dobi is aDog dog '
Ifmaktrans
With two parameters, the two parameters form a mapping, and the two strings must be equal in length; if there is a third parameter, the third parameter must also be a string, which is automatically mapped toNone
:
a = 'dobi is a dog'
table = str.maketrans('dobi', 'alph')
a.translate(table)
# 'alph hs a alg'
table = str.maketrans('dobi', 'alph', 'o')
a.translate(table)
# 'aph hs a ag'
Union and segmentation of strings
str.join(iterable)
An iterated object that connects elements to strings with the specified string.
'-'.join(['2012', '3', '12'])
# '2012-3-12'
'-'.join([2012, 3, 12])
# TypeError: sequence item 0: expected str instance, int found
'-'.join(['2012', '3', b'12']) #bytes Non stringTypeError: sequence item2: expected str instance, bytes found
'-'.join(['2012'])
# '2012'
'-'.join([])
# ''
'-'.join([None])
# TypeError: sequence item 0: expected str instance, NoneType found
'-'.join([''])
# ''
','.join({'dobi':'dog', 'polly':'bird'})
# 'dobi,polly'
','.join({'dobi':'dog', 'polly':'bird'}.values())
# 'dog,bird'
str.partition(sep); str.rpartition(sep)
'dog wow wow jiao'.partition('wow')
# ('dog ', 'wow', ' wow jiao')
'dog wow wow jiao'.partition('dog')
# ('', 'dog', ' wow wow jiao')
'dog wow wow jiao'.partition('jiao')
# ('dog wow wow ', 'jiao', '')
'dog wow wow jiao'.partition('ww')
# ('dog wow wow jiao', '', '')
'dog wow wow jiao'.rpartition('wow')
Out[131]: ('dog wow ', 'wow', ' jiao')
'dog wow wow jiao'.rpartition('dog')
Out[132]: ('', 'dog', ' wow wow jiao')
'dog wow wow jiao'.rpartition('jiao')
Out[133]: ('dog wow wow ', 'jiao', '')
'dog wow wow jiao'.rpartition('ww')
Out[135]: ('', '', 'dog wow wow jiao')
str.split(sep=None, maxsplit=-1); str.rsplit(sep=None, maxsplit=-1)
'1,2,3'.split(','), '1, 2, 3'.rsplit()
# (['1', '2', '3'], ['1,', '2,', '3'])
'1,2,3'.split(',', maxsplit=1), '1,2,3'.rsplit(',', maxsplit=1)
# (['1', '2,3'], ['1,2', '3'])
'1 2 3'.split(), '1 2 3'.rsplit()
# (['1', '2', '3'], ['1', '2', '3'])
'1 2 3'.split(maxsplit=1), '1 2 3'.rsplit(maxsplit=1)
# (['1', '2 3'], ['1 2', '3'])
' 1 2 3 '.split()
# ['1', '2', '3']
'1,2,,3,'.split(','), '1,2,,3,'.rsplit(',')
# (['1', '2', '', '3', ''], ['1', '2', '', '3', ''])
''.split()
# []
''.split('a')
# ['']
'bcd'.split('a')
# ['bcd']
'bcd'.split(None)
# ['bcd']
str.splitlines([keepends])
The string is divided into lists by line spacing as a separator.keepends
byTrue
,After splitting, the row boundary character is preserved, and the recognized line boundary can be seen in the official document.
'ab c\n\nde fg\rkl\r\n'.splitlines()
# ['ab c', '', 'de fg', 'kl']
'ab c\n\nde fg\rkl\r\n'.splitlines(keepends=True)
# ['ab c\n', '\n', 'de fg\r', 'kl\r\n']
"".splitlines(), ''.split('\n') #Pay attention to the difference between them.# ([], [''])
"One line\n".splitlines()
# (['One line'], ['Two lines', ''])
String conditional judgement
str.endswith(suffix[, start[, end]]); str.startswith(prefix[, start[, end]])
text = 'outer protective covering'
text.endswith('ing')
# True
text.endswith(('gin', 'ing'))
# True
text.endswith('ter', 2, 5)
# True
text.endswith('ter', 2, 4)
# False
str.isalnum()
Any combination of strings and numbers is true, in short:
As long asc.isalpha()
, c.isdecimal()
, c.isdigit()
, c.isnumeric()
Any one of them is true.c.isalnum()
It’s true.
'dobi'.isalnum()
# True
'dobi123'.isalnum()
# True
'123'.isalnum()
# True
'Xu'.isalnum ()BeTrue
'dobi_123'.isalnum()
# False
'dobi 123'.isalnum()
# False
'%'.isalnum()
# False
str.isalpha()
Unicode Character databases, unlike Alphabetic, are true as letters (which generally have “Lm,” “Lt,” “Lu,” “Ll,” “or”Lo,”and so on).
'dobi'.isalpha()
# True
'do bi'.isalpha()
# False
'dobi123'.isalpha()
# False
'Xu'.isalpha ()BeTrue
str.isdecimal(); str.isdigit(); str.isnumeric()
The difference between the three methods is that the true value of Unicode universal identifier is different.
isdecimal
: Nd,isdigit
: No, Nd,isnumeric
: No, Nd, Nl
digit
Anddecimal
The difference is that there are some numerical strings.digit
But notdecimal
,Specifically poking here.
num = '\u2155'
print(num)
# ⅕
num.isdecimal(), num.isdigit(), num.isnumeric()
# (False, False, True)
num = '\u00B2'
print(num)
# ²
num.isdecimal(), num.isdigit(), num.isnumeric()
# (False, True, True)
num = "1" #unicode
num.isdecimal(), num.isdigit(), num.isnumeric()
# (Ture, True, True)
num = "'Ⅶ'"
num.isdecimal(), num.isdigit(), num.isnumeric()
# (False, False, True)
num = "Ten "num.isdecimal(), num.isdigit(), num.isnumeric()
# (False, False, True)
num = b"1" # byte
num.isdigit() # True
num.isdecimal() # AttributeError 'bytes' object has no attribute 'isdecimal'
num.isnumeric() # AttributeError 'bytes' object has no attribute 'isnumeric'
str.isidentifier()
Determines whether a string can be a valid identifier.
'def'.isidentifier()
# True
'with'.isidentifier()
# True
'false'.isidentifier()
# True
'dobi_123'.isidentifier()
# True
'dobi 123'.isidentifier()
# False
'123'.isidentifier()
# False
str.islower()
'Xu'.islower ()BeFalse
'ß'.islower() #German capital letterBeFalse
'aXu'.islower ()BeTrue
'ss'.islower()
# True
'23'.islower()
# False
'Ab'.islower()
# False
str.isprintable()
All characters in a string are printable or empty. Characters of the “Other” and “Separator” categories in the Unicode character set are non-printable (but excluding ASCII spaces (0x20)).
'dobi123'.isprintable()
# True
'dobi123\n'.isprintable()
Out[24]: False
'dobi 123'.isprintable()
# True
'dobi.123'.isprintable()
# True
''.isprintable()
# True
str.isspace()
Determines whether there is at least one character in the string and all characters are blank characters.
In [29]: '\r\n\t'.isspace()
Out[29]: True
In [30]: ''.isspace()
Out[30]: False
In [31]: ' '.isspace()
Out[31]: True
str.istitle()
To determine whether the characters in the string are initials, they ignore the non alphabetic characters.
'How Python Works'.istitle()
# True
'How Python WORKS'.istitle()
# False
'how python works'.istitle()
# False
'How Python Works'.istitle()
# True
' '.istitle()
# False
''.istitle()
# False
'A'.istitle()
# True
'a'.istitle()
# False
'Toss Abc Def 123'.istitle ()BeTrue
str.isupper()
'Xu'.isupper ()BeFalse
'DOBI'.isupper()
Out[41]: True
'Dobi'.isupper()
# False
'DOBI123'.isupper()
# True
'DOBI 123'.isupper()
# True
'DOBI\t 123'.isupper()
# True
'DOBI_123'.isupper()
# True
'_123'.isupper()
# False
String encoding
str.encode(encoding=”utf-8″, errors=”strict”)
fname = 'Xu ''Fname.encode ('ascii')
# UnicodeEncodeError: 'ascii' codec can't encode character '\u5f90'...
fname.encode('ascii', 'replace')
# b'?'
fname.encode('ascii', 'ignore')
# b''
fname.encode('ascii', 'xmlcharrefreplace')
# b'徐'
fname.encode('ascii', 'backslashreplace')
# b'\\u5f90'
Reference material
Python Built in type string method