pythonString, string manipulation functions and string related operations

String introduction

pythonString representation

PythonIn addition to processing numbers, strings can also be processed.

>>> 'spam eggs'
'spam eggs'
>>> 'doesn/'t'
"doesn't"
>>> "doesn't"
"doesn't"
>>> '"Yes," he said.'
'"Yes," he said.'
>>> "/"Yes,/" he said."
'"Yes," he said.'
>>> '"Isn/'t," she said.'
'"Isn/'t," she said.'

The output format of the string is the same as the input, wrapped in apostrophes, and the apostrophes and other special characters are escaped with backslashes. If there is a single apostrophe in the string and no double apostrophe, wrap it with a double apostrophe, otherwise it should be wrapped with a single apostrophe. The print statement to be introduced later can be without an escape sign or escape output.Character string.

Multiple rows of long strings can also be continued with the end of the slash, and the line headings of the continuation line are not ignored.

        hello = "This is a rather long string containing/n        several lines of text just as you would do in C./n            Note that whitespace at the beginning of the line is         significant./n"
        print hello

The result is

        This is a rather long string containing
        several lines of text just as you would do in C.
            Note that whitespace at the beginning of the line is significant.

For very long strings (such as a few paragraphs containing instructions), it’s troublesome to end each line with / N / in the way above, especially since you can’t rearrange them with a powerful editor like Emacs. In this case, you can use three double marks, such as

        hello = """
　
            This string is bounded by triple double quotes (3 times ").
        Unescaped newlines in the string are retained, though         it is still possible/nto use all normal escape sequences.
　
            Whitespace at the beginning of a line is
        significant.  If you need to include three opening quotes
        you have to escape at least one of them, e.g. /""".
　
            This string ends in a newline.
        """

The three heavy string can also be used with three single names, without any semantic difference.

Multi-line string constants can be joined directly, and space-delimited string constants can be joined automatically at compile time, so that a long string can be joined without sacrificing indentation alignment or performance, unlike a plus-sign join that requires computation, or a newline in a string with its line header spaceNeeds to be maintained.

String linking and duplication

　The string can be linked by the + sign and repeated with the *.

>>> word = 'Help' + 'A'
>>> word
'HelpA'
>>> '<' + word*5 + '>'
'<HelpAHelpAHelpAHelpAHelpA>'

Several ways of string connection

def concat1(): 
    z = x + y 
    return z 

 def concat2(): 
    z = "%s%s" % (x, y) 
    return z 

def concat3(): 
    z = "{}{}".format(x, y) 
    return z 

def concat4(): 
    z = "{0}{1}".format(x, y) 
    return z

[PythonFast string connection in]

String index

The string can be indexed by the subscript as in C, and the first character of the string is subscript to 0.

PythonThere is no separate character data type, and a character is a string of length. As in Icon, you can specify substrings with slice tokens, which are two subscripts separated by colons.

>>> word[4]
'A'
>>> word[0:2]
'He'
>>> word[2:4]
'lp'

Fragments have good defaults: the first subscript is omitted by default to zero, and the second by default to the length of the string.

>>> word[:2]    # The first two characters'He'> > > word[2:], except for the first two strings.'lpA'

Note that s[: i] + s[i:] equal to s is a useful identities for fragment operations.

>>> word[:2] + word[2:]
'HelpA'
>>> word[:3] + word[3:]
'HelpA'

Unreasonable segment subscriptions can be well explained: excessive subscriptions are replaced by string lengths, and empty strings are returned when the upper bound is less than the lower bound.

>>> word[1:100]
'elpA'
>>> word[10:]
''
>>> word[2:1]
''

Subscript is allowed to be negative, then from right to left. For example:

>>> word[-1]     # Last character'A'> > > word[-2] word[-2] second characters.'p'> > > word[-2:] word[-2:] two characters.'pA'> > >, word[: -2], except for the last two characters.'Hel'But it should be noted that -0 is actually 0, so it will not count from right to left.

>>> word[-0]     # (Because -0 equals 0.'H'

The fragment subscript beyond the scope is truncated, but not in the case of non fragment.

>>> word[-100:]
'HelpA'
>>> word[-10]    # errorTraceback (innermost last):File "< stdin>", line 1IndexError: string index out of range

The best way to remember the meaning of a fragment is to think of the subscript as a point between characters, and the left border number of the first character is zero. The right boundary of the last character of a n character string is subscript to N, for example:

 +---+---+---+---+---+ 
 | H | e | l | p | A |
 +---+---+---+---+---+ 
 0   1   2   3   4   5 
-5  -4  -3  -2  -1
The first row gives the position of the subscript 0 to 5 in the string, and the second line gives the corresponding negative subscript. The fragments from I to j consist of characters between the boundary I and J.

For non negative subscript, if the subscript is within the bounds, the length of the fragment is the subscript difference. For example, the length of word[1:3] is 2.

Segmentation of strings

After defining a string, we can intercept any part of it to form a new string. This operation is called slice. String fragmentation is the same as slicing lists, and it makes sense intuitively, because the string itself isSome character sequences.

>>> a_string = "My alphabet starts where your alphabet ends."
>>> a_string[3:11]
"alphabet"
>>> a_string[3:-3]
"alphabet starts where your alphabet en"
>>> a_string[0:2]
"My"
>>> a_string[:18]
"My alphabet starts"
>>> a_string[18:]
" where your alphabet ends."

We can get a slice of the original string by specifying two index values. The return value of this operation is a new string that, in turn, contains all the characters from the first index position in the original string to but not between the second index position.
Just like splitting the list, we can also use the negative index value to segment the string.
The subscript index of a string starts at zero, so a_string [0:2] returns the first two elements of the original string, starting with a_string [0], until but excluding a_string [2].
If the first index value is omitted, Python will default its value to 0. So a_string[: 18] has the same effect as a_string[0:18], because it was Python’s default from 0.
Similarly, if the second index value is the length of the original string, we can also omit it. So, here a_string [18:] has the same result as a_string [18:44] because the string happens to have 44 characters. There is something interesting about this rule.Symmetry. In this 44-character string, a_string [: 18] returns the first 18 characters, and a_string [18:] returns the rest of the string except the first 18 characters. In fact, a_string[: n] always returns.The first n characters of the string, while a_string[n:] returns the rest, which is independent of the length of the string.

String length

　The built-in function len () returns the length of the string:

>>> s = 'supercalifragilisticexpialidocious'
>>> len(s)
34

Skin Blog

PythonBuilt in string handler function

pythonString substring

pythonThere is no substring function, because str[start:end] can be used directly.

Alphabetic processing

All capitals: str.upper ()
All lowercase: str.lower ()
Case by case swap: str.swapcase ()
The first letter is uppercase and the rest is lowercase: str.capitalize ().
Initial letter capitalization: str.title ()
print ‘%s lower=%s’ % (str,str.lower())
print ‘%s upper=%s’ % (str,str.upper())
print ‘%s swapcase=%s’ % (str,str.swapcase())
print ‘%s capitalize=%s’ % (str,str.capitalize())
print ‘%s title=%s’ % (str,str.title())

Formatting correlation

Get fixed length, right justified, left not enough space complement: str.ljust (width)
Get fixed length, left alignment, and blank space on the right: str.ljust (width).
Get fixed length, middle alignment, and not enough space on both sides: str.ljust (width)
Get fixed length, right alignment, and 0 on the left.
print ‘%s ljust=%s’ % (str,str.ljust(20))
print ‘%s rjust=%s’ % (str,str.rjust(20))
print ‘%s center=%s’ % (str,str.center(20))
print ‘%s zfill=%s’ % (str,str.zfill(20))

String search correlation

Search for specified string without returning -1:str.find (‘t’)

pythonString lookup specifies all the locations of multiple substrings:

a = “dhvka,feovj.dlfida?dfka.dlviaj,dlvjaoid,vjaoj?”
b = [i for i, j in enumerate(a) if j in [‘,’, ‘.’, ‘?’]]
print(b)

[5, 11, 18, 23, 30, 39, 45]

Specify initial location search: str.find (‘t’, start)
Specify start and end location search: str.find (‘t’, start, end)
Search from the right: str.rfind (‘t’)
How many specified strings are searched: str.count (‘t’)
All of the above methods can be replaced by index, unlike using index to find an exception that throws, and find returns – 1
print ‘%s find nono=%d’ % (str,str.find(‘nono’))
print ‘%s find t=%d’ % (str,str.find(‘t’))
print ‘%s find t from %d=%d’ % (str,1,str.find(‘t’,1))
print ‘%s find t from %d to %d=%d’ % (str,1,2,str.find(‘t’,1,2))
#print ‘%s index nono ‘ % (str,str.index(‘nono’,1,2))
print ‘%s rfind t=%d’ % (str,str.rfind(‘t’))

print ‘%s count t=%d’ % (str,str.count(‘t’))

pythonMethod for determining whether strings contain substrings

1. if ‘abcde’.__contains__(“abc”)

2. if “abc” in ‘abcde’

3.’abcde’.find(‘bcd’) >= 0

4.’abcde’.count(‘bcd’) > 0

5.try:
    string.index(ls,ss)
    print ‘find it’
except(ValueError):
    print ‘fail’

[http://blog.csdn.net/elvis_kwok/article/details/7405083]

[PythonString operation]

String substitution correlation

Replace old with new:str.replace (‘old’,’new’).
The old for the specified number of times is new:str.replace (‘old’,’new’, maxReplaceTimes).
print ‘%s replace t to *=%s’ % (str,str.replace(‘t’, ‘*’))

print ‘%s replace t to *=%s’ % (str,str.replace(‘t’, ‘*’,1))

Strings replace multiple characters at a time.

Replace the [] in the string into a blank space.

methon1: tags = re.sub(“

|

|'”, “”, str(music_tag[music_name]))

methon3: [Replace multiple strings at a time]

String blanking and character removal

Go to both sides of the blanks: str.strip ()
Left space: str.lstrip ()
Go to the right space: str.rstrip ().

Go to both sides of the string (support for regular): s.strip (“

The string is divided into an array according to the specified character: str.split (”).
Space by default
Specify separator STR, str.split (‘-‘).

String judgment correlation

Do you start with start: str.startswith (‘start’)?
Ending with end: str.endswith (‘end’)
Are all letters or numbers: str.isalnum ()
Is it all alphabet: str.isalpha ()
Whether it is all digital: str.isdigit ()
Is it all lowercase: str.islower ()
Is it all capitalized: str.isupper ()

str=’python String function’

[PythonBuilt in string processing function collation]

Skin Blog

String dependent operations

repr(Anti quotation mark operation

In Python 2, there is a special syntax for wrapping objects in quotes (such as `x’) to get a string representation of an arbitrary object. In Python 3, this ability still exists, but you can’t get the string representation with the back quotes anymore. You need to makeUse the global function repr ().

Notes	Python 2	Python 3
①	`x`	repr(x)
②	`’PapayaWhip’ + `2“	repr(‘PapayaWhip’+ repr(2))

Remember, X can be anything – a class, function, module, basic data type, and so on. The repr () function can use any type of parameter.
In Python 2, the anti quotation marks can be nested, leading to this puzzling (but effective) expression. 2to3 is smart enough to convert this nested call to the repr () function.

String segmentation

Splitting strings using multiple delimiters

Solution 1: [http://python3-cookbook.readthedocs.org/zh_CN/latest/c02/p01_split_string_on_multiple_delimit]Ers.html]

stringThe object’s split () method is only suitable for very simple string splitting situations, and does not allow multiple separators or indeterminate spaces around them. When you need to cut strings more flexibly, you’d better use the re.split () method:

>>> line = 'asdf fjdk; afed, fjek,asdf, foo'
>>> import re
>>> re.split(r'[;,\s]\s*', line)
['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']

Solution 2:

s = 'Hello!This?Is!What?I!Want'
for i in ('!', '?'):
    s = s.replace(i,' ')

list1 = s.split()

Solution 3:

def my_split(s, seps):

res = [s]

for sep in seps:

s, res = res, []

for seq in s:

res += seq.split(sep)

return res

my_split(s, [‘!’, ‘?’])

format string

Since python2.6, a new function str.format () for formatting strings has been added. It is superior to the previous% format string.

grammar

It replaces% by {} and:

“Mapping example

Through position

In [1]: '{0},{1}'.format('kzc',18)  
Out[1]: 'kzc,18'  
In [2]: '{},{}'.format('kzc',18)  
Out[2]: 'kzc,18'  
In [3]: '{1},{0},{1}'.format('kzc',18)  
Out[3]: '18,kzc,18'

The format function of a string can take unlimited arguments, positions can be out of order, unused or used many times, but 2.6 can’t be empty {}, 2.7 can.
By keyword parameter

In [5]: '{name},{age}'.format(age=18,name='kzc')  
Out[5]: 'kzc,18'

Through object properties

class Person:  
    def __init__(self,name,age):  
        self.name,self.age = name,age  
        def __str__(self):  
            return 'This guy is {self.name},is {self.age} old'.format(self=self)

In [2]: str(Person('kzc',18))  
Out[2]: 'This guy is kzc,is 18 old'

Through subscript

In [7]: p=['kzc',18]
In [8]: '{0[0]},{0[1]}'.format(p)
Out[8]: 'kzc,18'

With these convenient “mapping” methods, we have a lazy weapon. Basic Python knowledge tells us that list and tuple can be “scattered” into ordinary parameters to the function, and dict can be scattered into keyword parameters to the function (through and *). So it’s easy to pass on.A list/tuple/dict is given to the format function. Very flexible.

Format qualifier

It has a rich “format qualifier” (grammar is {} with: number), for example:

Filling and aligning
Filling is often used with alignment.
^、<、>They are centered, left aligned, right aligned, followed by bandwidth;: the number with a filler character, can only be a character, not specified by default is filled with space.
such as

In [15]: '{:>8}'.format('189')
Out[15]: '     189'
In [16]: '{:0>8}'.format('189')
Out[16]: '00000189'
In [17]: '{:a>8}'.format('189')
Out[17]: 'aaaaa189'
print('{:*^60}'.format(' Number of data records per TERMINALNO)************************************** Data Record Number per TERMINALNO ******************** (Chinese characters are only one character, the placeDo you export Chinese characters or not?

Note: If you use {:> 20} to format a list, first convert the list to STR (), otherwise type Error: unsupported format string passed to list. u format_u.

Accuracy and type F
Accuracy is often used with type F.

In [44]: '{:.2f}'.format(321.33345)
Out[44]: '321.33'

Among them,.2 indicates the accuracy of length 2, and f means float type.

Other types
It is mainly binary, B, D, O, X are binary, decimal, octal, sixteen hexadecimal.

In [54]: '{:b}'.format(17)
Out[54]: '10001'
In [55]: '{:d}'.format(17)
Out[55]: '17'
In [56]: '{:o}'.format(17)
Out[56]: '21'
In [57]: '{:x}'.format(17)
Out[57]: '11'

Set binary output bit width: 32 bit output, 0 less than before.

print(‘{:032b}’.format(i)) # bin(i)

The number can also be used to make a thousand delimiter of the amount.

In [47]: '{:,}'.format(1234567890)
Out[47]: '1,234,567,890'

【http://blog.csdn.net/handsomekang/article/details/9183303】

【StringFormat format specification function and example

format string

Strings can be defined by single quotation marks or double quotes.

Let’s take another look at humansize.py:

SUFFIXES = {1000: [“KB”, “MB”, “GB”, “TB”, “PB”, “EB”, “ZB”, “YB”], 1024: [“KiB”, “MiB”, “GiB”, “TiB”, “PiB”, “EiB”, “ZiB”, “YiB”]}def approximate_size(size, a_kilobyte_is_1024_bytes=True): “””Convert a file size to human-readable form. Keyword arguments: size — file size in bytes a_kilobyte_is_1024_bytes — if True (default), use multiples of 1024 if False, use multiples of 1000 Returns: string “”” if size < 0: raise ValueError(“number must be non-negative”) multiple = 1024 if a_kilobyte_is_1024_bytes else 1000 for suffix in SUFFIXES[multiple]: size /= multiple if size < multiple: return “{0:.1f} {1}”.format(size, suffix) raise ValueError(“number too large”)

“KB”, “MB”, “GB”… These are strings.
The document string (docstring) of the function is also a string. The current document string occupies multiple lines, so it uses three adjacent quotation marks to mark the start and end of the string.
These 3 quotes represent the termination of the document string.
This is another string that is passed to the exception as a readable reminder message.
Wa oh & hellip; what is that?

Python 3Support for value formatting (format) into strings. You can have very complex expressions, and the most basic use is to insert a value into a string using a single placeholder.

>>> username = "mark"
>>> password = "PapayaWhip"
>>> "{0}"s password is {1}".format(username, password)
"mark"s password is PapayaWhip"

No, PapayaWhip is really not my password.
This includes a lot of knowledge. First, a string literal method call is used here. Strings are objects, and objects have their own way. Next, the entire expression returns a string. Finally, {0} and {1} are called replacement field.They will be passed to the parameter substitution of the format () method.

Composite field name

In the previous example, the replacement field is just a simple integer, which is the simplest usage. The integer replacement field is used as a location index to the parameter list passed to the format () method. That is, {0} will be replaced by the first parameter (in this case, username), {1} is the second parameter.Replace (password), & C. You can have as many replacement fields as parameters, and you can call format () with any number of parameters. But replacing fields is far more powerful than this.

>>> import humansize
>>> si_suffixes = humansize.SUFFIXES[1000]
>>> si_suffixes
["KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"]
>>> "1000{0[0]} = 1{0[1]}".format(si_suffixes)
"1000KB = 1MB"

Without calling any of the functions defined by the human size module, we can grab the suffix list (in 1000) of the data structure it defines: the International System of Units (SI).
This sentence looks complicated, but it is not. {0} represents the first parameter passed to the format () method, that is, si_suffixes. Note that si_suffixes is a list. So {0[0]} refers to the first of si_suffixes.The element is “KB”. At the same time, {0[1]} refers to the second element of the list, that is, “MB”. The contents beyond braces, including 1000, equal sign, and space, are output as they are. The last sentence returns the string “1000KB = 1MB”..

{0}It will be replaced by first parameters of format (), and {1} will be replaced by its second parameters.

This example shows that format descriptors can access elements or attributes of an object by using (similar) Python syntax. This is called the composite field name (compound field names). The following compound field names are valid.

Use the list as a parameter and access its elements through the subscript index (followed by a similar example).
Use dictionaries as parameters and access values through keys.
Use modules as parameters and access variables and functions by name.
Use the instance of the class as a parameter and access its methods and properties by name.
Any combination of the above methods

To make sure you are sure, the following example uses all of the above methods together.

>>> import humansize
>>> import sys
>>> "1MB = 1000{0.modules[humansize].SUFFIXES[1000][0]}".format(sys)
"1MB = 1000KB"

The following is a description of how it works:

sysThe module saves information about the currently running Python instance. Because this module has been imported, it can be used as a parameter for the format () method. So the replacement domain {0} refers to the sys module.
sys.modules is a dictionary of all the modules that have been imported in this Python instance. The keys are the module names as strings; the values are the module objects themselves. So the replacement field{0.modules} refers to the dictionary of imported modules.sys.modulesIs a dictionary that preserves all the imported modules in the current Python instance. The name of the module is the key of the dictionary; the module itself is the value corresponding to the key. So {0.modules} refers to saving the dictionary that has been imported into the module.
sys.modules[“humansize”]The humansize module just imported. So the replacement domain {0.modules[humansize]} refers to the humansize module. Please note that the two sentences are slightly different in syntax. In the actual Python code, the dictionary sys.moduThe Les keys are string type; to refer to them, we need to place quotes around the module name (such as “human size”). But when using the replacement domain, we omit quotes around the keyname of the dictionary (such as human size). Here, we quote PEP 3101: string formatting advanced usage, parsing the key name rule is very simple. If the name starts with a number, it is used as a number, and the rest is considered a string.
sys.modules[“humansize”].SUFFIXESIt is a dictionary object defined at the beginning of the humansize module. {0.modules[humansize].SUFFIXES} points to the dictionary.
sys.modules[“humansize”].SUFFIXES[1000]It is a SI (International System of units) suffix list: [“KB”, “MB”, “GB”, “TB”, “PB”, “EB”, “ZB”, “YB”). So replace the domain {0.modules[humansize].SUFFIXES[1000]} points to the list.
sys.modules[“humansize”].SUFFIXES[1000][0]That is, the first element of the SI suffix list: “KB”. Therefore, the entire replacement domain {0.modules[humansize].SUFFIXES[1000][0]} is finally replaced by two character KB.

format specifications

However, there are still some problems we have not mentioned. Let’s take a look at the odd code in humansize.py.

if size < multiple: return “{0:.1f} {1}”.format(size, suffix)

{1}It will be passed to the second parameter substitution of the format () method, that is, suffix. But what does {0:.1f} mean? It actually contains two aspects: {0} you can understand:.1f is not necessarily. The second part, including the colon and its back part, is the case.Format specifier, which further defines how the replaced variables should be formatted.

Format descriptors allow you to modify the replaced text in various practical ways, just like the printf () function in C. We can add zero fill (zero-padding), space-padding, and string.Align strings) controls the output precision of the 10 hexadecimal number, and even converts the number into 16 binary output.

In the replacement field, the colon (()) indicates the beginning of the format specifier. The meaning of.1 is to keep four decimal places into five decimal places. F means fixed-point number (corresponding to index notation or other 10 digit representation). Therefore, if a given size is 698.24, suffix is”GB,” then the formatted string will be “698.2 GB,” because 698.24 is rounded to a decimal representation, and then the suffix “GB” is appended to the end of the string.

>>> "{0:.1f} {1}".format(698.24, "GB")
"698.2 GB"

To understand the complex details of formatting descriptors, see the Python official documentation on formatting specifications in the Mini Language.

pythonDetermine whether string is composed of pure numbers.

import re

def IsFloatStr(s):
    '''
    Floating point judgement'' ''try:
        float(s)
        # s.isdigit()  # This is wrong and can not be judged.Implementation of regular expressionReturn True if re.match (r'[-+]*\d+? \.{0,1}\d*$', s) else FalsEreturn True
    except:
        return Falsedef IsIntStr(s):
    '''
    Integer judgment'' ''try:
        int(s)
        # s.isdigit()#This is wrong and can not be judged.Return True if re.match (r'[-+]*\d+$', s) else Falsereturn True
    except:
        return False
for s in ['123', '-123', '+123', '-12.3', '-1.2.3', '123hello']:
    print('s is a num str' if IsFloatStr(s) else 's is not a num str')
    print('s is a num str' if IsIntStr(s) else 's is not a num str')

pythonDelete blank characters in strings: line breaks, spaces, tab characters.

print(' '.join("Please \n don't \t hurt me.".split()))
Output:Please don't hurt me.

PythonString character by character or word by word inversion method

Reverse string by character or word by word.

1. revchars=astring[::-1]

x=‘abcd’
In [66]: x[::-1]
Out[66]: ‘dcba’

2. With reversed (), notice that it returns an iterator that can be used to loop or pass to other “accumulators” instead of a completed string.

revchars=”.join(reversed(astring))

y=reversed(x)
In [57]: y
Out[57]: <reversed object at 0x058302F0>
In [58]: ”.join(y)
Out[58]: ‘dcba’

Word by word reversal
1. Create a list, reverse the list, and merge it by join.

s='Today is really a good day'

rev=' '.join(s.split()[::-1])

2. You can change the original space and use regular form to do it.

revwords=‘ ‘.join(reversed(s.split()))
revwords=”.join(reversed(re.split(r'(\s+)’,s)))

[PythonString by character or word by word inversion method]

Other common string techniques

Besides formatting, there are many other practical skills about strings.

>>> s = """Finished files are the re-... sult of years of scientif-... ic study combined with the... experience of years.""">>> s.splitlines()["Finished files are the re-", "sult of years of scientif-", "ic study combined with the", "experience of years."]>>> print(s.lower())finished files are the re-sult of years of scientif-ic study combined with theexperience of years.>>> s.lower().count("f")6

We can input in the interactive shell of Python.Multi line (multiline)Character string. Once we mark the beginning of the multi line string with three quotes, pressENTERPython shell prompts you to continue the input of this string. Enter three end quotes continuously to terminate the input of the string and knock again.ENTERThe key executes the command (in the current example, assign the string to the variable.s）。
splitlines()Method takes a multi-line string as input and returns a list of strings whose elements are the original single-line strings. Please note that the carriage returns at the end of each row are not included.
lower()Method converts the entire string into lowercase. (similarly,upper()Method performs the capitalization conversion operation.
count()Method enumeration of the specified substrings in the string. Yes, in that sentence, there are 6 letters f.

There is also a situation that is often encountered. For example, there is a list of key value pairs in the following formkey1=value1&key2=value2，We need to separate it and produce a dictionary of this form.{key1: value1, key2: value2}。

>>> query = "user=pilgrim&database=master&password=PapayaWhip">>> a_list = query.split("&")>>> a_list["user=pilgrim", "database=master", "password=PapayaWhip"]>>> a_list_of_lists = [v.split("=", 1) for v in a_list]>>> a_list_of_lists[["user", "pilgrim"], ["database", "master"], ["password", "PapayaWhip"]]>>> a_dict = dict(a_list_of_lists)>>> a_dict{"password": "PapayaWhip", "user": "pilgrim", "database": "master"}

split()Method uses a parameter, the specified separator, and then separates the string into a list of strings. Here, the separator is the character.&，It can also be something else.
Now we have a list of strings, each of which consists of three parts: keys, equal signs, and values. We can use list parsing to traverse the entire list, and then use the first equality mark to separate each string into two substrings. (theoretically, a value can also contain an equal sign, if executed."key=value=foo".split("=")，Then we will get a list of three elements.["key", "value", "foo"]。）
Finally, by callingdict()The function Python converts the list containing the list (list-of-lists) to the dictionary object.

Last example and analysisURLThe request parameter (query parameters) is very similar, but true.URLParsing is actually much more complicated than that. If need to deal withURLRequest parameters, we’d better use.urllib.parse.parse_qs()Function, it can handle some unusual edge cases.

python 2.xDifference from string encoding in 3.x

String vs. Bytes

python stringDifference between object and bytes object

Byte is byte; character is an abstraction. An immutable (immutable) Unicode encoded character sequence is called string.

bytesObject: a sequence of numbers from 0 to 255.

by = b”abcde”
len(by)
5
by += b”ÿ”
by
b”abcdeÿ”
by[0]
97
by[0] = 102
<samp class=”traceback”>Traceback (most recent call last): File “<stdin>”, line 1, in <module>TypeError: “bytes” object does not support item assignment

Use the byte literal syntax B “” to define bytes objects. Each byte in the byte literal can be ASCII character or the 16 hexadecimal number encoded by the code. The type of bytes object is bytes.
Use the + operator to connect bytes objects. The result of the operation is a new bytes object. Connecting 5 byte and 1 byte bytes objects will return a 6 byte bytes object.
For lists and strings, you can use the subscript mark to get a single byte in the bytes object. The element obtained by doing this on a string is still a string, and the return value for a bytes object is an integer. To be exact, it is between 0& ndash; 255.Integers.
bytesObjects are immutable; we can not assign new values to single bytes. If you need to change a byte, you can combine slicing and concatenation operations using strings (the same effect as strings), or you can convert bytes objects to bytearray objects.
by = b”abcde”
barr = bytearray(by)
barr
bytearray(b”abcde”)
barr[0] = 102
barr
bytearray(b”fbcde”)

All operations on bytes objects can also be used on bytearray objects.
One difference is that we can use subscript tags to assign a byte to a bytearray object. Moreover, this value must be an integer between 0& ndash; 255.

You can’t mix bytes and strings.

by = b”d”
s = “abcde”
by + s
Traceback (most recent call last): File “<stdin>”, line 1, in <module>TypeError: can”t concat bytes to str
s.count(by)
Traceback (most recent call last): File “<stdin>”, line 1, in <module>TypeError: Can”t convert “bytes” object to str implicitly
s.count(by.decode(“ascii”))
1

Unable to connect bytes objects and strings. They have two different data types.
It is also not allowed to count the number of bytes objects in a string because there are no bytes at all in the string. A string is a series of character sequences. Maybe you want to decode these byte sequences to get strings by some sort of encoding, and you need to explicitly specify it. PythOn 3 does not implicitly convert bytes to string or reverse operation.

print('pro' == b'pro')
flase

The relationship between strings and byte arrays

bytesObjects have a decode () method that takes a character encoding as a parameter and converts the bytes object to a string in this way. Correspondingly, the string has an encode () method, which also uses a character encoding as a parameter, and then converts the string according to itBytes object.

In the previous example, the decoding process was relatively straightforward — using ASCII encoding to convert a sequence of bytes into a string. The same process is still valid for other encoding methods — traditional (non-Unicode) encoding is also possible, as long as they can encode what is in the stringThere is a character.

a_string = “Deep into Python “
len(a_string)
9
by = a_string.encode(“utf-8”)
by
b”æ·±å…¥ Python”
len(by)
13
by = a_string.encode(“gb18030”)
by
b”ÉîÈë Python”
len(by)
11
by = a_string.encode(“big5”)
by
b”²`¤J Python”
len(by)
11
roundtrip = by.decode(“big5”)
roundtrip
“Deep into Python “
a_string == roundtrip
True

Note:roundtripIt is a string with 9 characters. It is a sequence of characters obtained by using Big5 decoding algorithm for by. Moreover, from the execution result, we can see that roundtrip is exactly the same as a_string.

[pythonString encoding and distinction]

from:http://blog.csdn.net/pipisorry/article/details/42085723

ref: Python Built in string method (Collector only)

Python string, string manipulation function and string related operations

pythonString, string manipulation functions and string related operations

pythonString representation

String linking and duplication

String index

Segmentation of strings

String length

PythonBuilt in string handler function

pythonString substring

Alphabetic processing

Formatting correlation

String search correlation

pythonMethod for determining whether strings contain substrings

String substitution correlation

Strings replace multiple characters at a time.

String blanking and character removal

String judgment correlation

String dependent operations

repr(Anti quotation mark operation

String segmentation

Splitting strings using multiple delimiters

format string

grammar

“Mapping example

Format qualifier

format string

Composite field name

format specifications

pythonDetermine whether string is composed of pure numbers.

pythonDelete blank characters in strings: line breaks, spaces, tab characters.

PythonString character by character or word by word inversion method

Other common string techniques

python 2.xDifference from string encoding in 3.x

String vs. Bytes

Leave a Reply Cancel reply