how to access the individual elements of a matrix in python

111 100+

my file is of the form

01 "\t" 10.19 "\t" 0.00 "\t" 10.65
02 "\t" 11.19 "\t" 10.12 "\t" 99.99

and i need to access the individual floating point numbers from it!
say for ex. the first no is 10.19.. i want to access this and add one to it.

Expand|Select|Wrap|Line Numbers

 
filename=open("half.transfac","r")

file_content=filename.readlines()

sam=""

for line in file_content:

    for char in line:

        if char=="\tchar\t\n":

            sam+=char

            print sam

for char accesss every digit and not the numbers{"10.19","0.00")etc.. how do i do this..help

Jul 5 '07 #1

Subscribe Post Reply

4626

dshimer

136

Expert 100+

There are powerful ways to do this all in one line, but by way of explanation, start by using split() to separate each line into individual data lists for further manipulation.

Expand|Select|Wrap|Line Numbers

 >>> for line in file_content:

...     line.split()

... 

['01', '10.19', '0.00', '10.65']

['02', '11.19', '10.12', '99.99']

Each of which could be appended to an empty list, forming a multi-dimensional data set. Note that all the elements are strings and would have to be converted to numbers before the math.

Expand|Select|Wrap|Line Numbers

 >>> datalist=[]

>>> for line in file_content:

...     datalist.append(line.split())

... 

>>> datalist

[['01', '10.19', '0.00', '10.65'], ['02', '11.19', '10.12', '99.99']]

>>> int(datalist[0][0])

1

>>> float(datalist[0][1])

10.19

my file is of the form

01 "\t" 10.19 "\t" 0.00 "\t" 10.65
02 "\t" 11.19 "\t" 10.12 "\t" 99.99

and i need to access the individual floating point numbers from it!
say for ex. the first no is 10.19.. i want to access this and add one to it.

Expand|Select|Wrap|Line Numbers

filename=open("half.transfac","r")

file_content=filename.readlines()

sam=""

for line in file_content:

    for char in line:

        if char=="\tchar\t\n":

            sam+=char

            print sam

for char accesss every digit and not the numbers{"10.19","0.00")etc.. how do i do this..help

Jul 5 '07 #2

bvdet

2,851

Expert Mod 2GB

my file is of the form

01 "\t" 10.19 "\t" 0.00 "\t" 10.65
02 "\t" 11.19 "\t" 10.12 "\t" 99.99

and i need to access the individual floating point numbers from it!
say for ex. the first no is 10.19.. i want to access this and add one to it.

Expand|Select|Wrap|Line Numbers

filename=open("half.transfac","r")

file_content=filename.readlines()

sam=""

for line in file_content:

    for char in line:

        if char=="\tchar\t\n":

            sam+=char

            print sam

for char accesss every digit and not the numbers{"10.19","0.00")etc.. how do i do this..help

Here is another way to access the numbers from a dictionary:

Expand|Select|Wrap|Line Numbers

 >>> lineList = open(file_name).readlines()

>>> dataDict = {}

>>> for line in lineList:

...     line = line.split()

...     dataDict[line[0]] = [float(r) for r in line[1:]]

...     

>>> dataDict

{'02': [11.19, 10.119999999999999, 99.989999999999995], '01': [10.19, 0.0, 10.65]}

>>> dataDict['01'][0]

10.19

>>> dataDict['02'][2]

99.989999999999995

You can perform mathematical operations on elements of the dictionary:

Expand|Select|Wrap|Line Numbers

 >>> dataDict

{'02': [11.19, 10.119999999999999, 99.989999999999995], '01': [10.19, 0.0, 10.65]}

>>> dataDict['01'][0] += 1

>>> dataDict

{'02': [11.19, 10.119999999999999, 99.989999999999995], '01': [11.19, 0.0, 10.65]}

>>>

To access individual characters:

Expand|Select|Wrap|Line Numbers

 >>> for key in dataDict:

...     for item in dataDict[key]:

...         print ' '.join([ch for ch in '%0.2f' % item])

...         

1 1 . 1 9

1 0 . 1 2

9 9 . 9 9

1 0 . 1 9

0 . 0 0

1 0 . 6 5

Expand|Select|Wrap|Line Numbers

 >>> [ch for ch in '%0.2f' % dataDict['01'][0]]

['1', '1', '.', '1', '9']

Jul 5 '07 #3

aboxylica

111

100+

THis is my file! AS u can see the first column gives the position,the second third ..etc are the part of the matrix.my problem here is that.I have to write a code to access the individual values.like if i say A[01] i should have the value 0.00 or say A[06] i should have a value 3.46. or C[04]=3.67 etc. and i have to add one to each of these elements.can you gimme the code. my code is not working.

NA bap

PO A C G T

01 0.00 3.67 0.00 0.00

02 0.00 0.00 3.67 0.00

03 0.00 0.00 0.00 3.67

04 0.00 3.67 0.00 0.00

05 3.67 0.00 0.00 0.00

06 3.46 0.00 0.22 0.00

07 0.00 0.00 3.67 0.00

08 0.00 0.00 0.00 3.67

09 0.00 0.00 0.00 3.67

10 0.00 3.67 0.00 0.00

11 3.67 0.00 0.00 0.00

12 3.67 0.00 0.00 0.00

13 0.00 0.00 3.67 0.00

14 0.00 0.00 0.00 3.67

15 0.00 0.00 3.67 0.00

16 0.00 3.67 0.00 0.00

//

//

NA bcd

Jul 6 '07 #4

bartonc

6,596

Expert 4TB

Files operate much like lists, so I alway practice with a list, then go to a file:
Here's one way to look at such data in python:

Expand|Select|Wrap|Line Numbers

 rawdata = \

['01 0.00 3.67 0.00 0.00',
 
'02 0.00 0.00 3.67 0.00',
 
'03 0.00 0.00 0.00 3.67',
 
'04 0.00 3.67 0.00 0.00',
 
'05 3.67 0.00 0.00 0.00',
 
'06 3.46 0.00 0.22 0.00',
 
'07 0.00 0.00 3.67 0.00',
 
'08 0.00 0.00 0.00 3.67',
 
'09 0.00 0.00 0.00 3.67',
 
'10 0.00 3.67 0.00 0.00',
 
'11 3.67 0.00 0.00 0.00',
 
'12 3.67 0.00 0.00 0.00',
 
'13 0.00 0.00 3.67 0.00',
 
'14 0.00 0.00 0.00 3.67',
 
'15 0.00 0.00 3.67 0.00']
 
datadictionary = {} # usually shorten the name to dd
 
for line in rawdata:

    items = line.split()

    key = items[0]

    datadictionary[key] = [float(item) for item in items[1:]]
 
print datadictionary['09'][3]

Jul 6 '07 #5

bartonc

6,596

Expert 4TB

Jul 6 '07 #6

aboxylica

111

100+

i have to consider the columns also:
like..this is my file
A T G C
01 1.00 2.00 3.00 4.00
02 2.00 3.00 4.00 5.00

now if i say A[01] i should have a value 1.00 or C[01]=4.00
those column names A,T,G,C. i am not able to format my query file properly hope it is understood

Files operate much like lists, so I alway practice with a list, then go to a file:
Here's one way to look at such data in python:

Expand|Select|Wrap|Line Numbers

rawdata = \

['01 0.00 3.67 0.00 0.00',

'02 0.00 0.00 3.67 0.00',

'03 0.00 0.00 0.00 3.67',

'04 0.00 3.67 0.00 0.00',

'05 3.67 0.00 0.00 0.00',

'06 3.46 0.00 0.22 0.00',

'07 0.00 0.00 3.67 0.00',

'08 0.00 0.00 0.00 3.67',

'09 0.00 0.00 0.00 3.67',

'10 0.00 3.67 0.00 0.00',

'11 3.67 0.00 0.00 0.00',

'12 3.67 0.00 0.00 0.00',

'13 0.00 0.00 3.67 0.00',

'14 0.00 0.00 0.00 3.67',

'15 0.00 0.00 3.67 0.00']

datadictionary = {} # usually shorten the name to dd

for line in rawdata:

    items = line.split()

    key = items[0]

    datadictionary[key] = [float(item) for item in items[1:]]

print datadictionary['09'][3]

Jul 6 '07 #7

bartonc

6,596

Expert 4TB

i have to consider the columns also:
like..this is my file
A T G C
01 1.00 2.00 3.00 4.00
02 2.00 3.00 4.00 5.00

now if i say A[01] i should have a value 1.00 or C[01]=4.00
those column names A,T,G,C. i am not able to format my query file properly hope it is understood

I hope you understand that you should be thinking "row 0n, column X, not the other way around. Rows enclose columns, so that is the first index that you deal with.

Expand|Select|Wrap|Line Numbers

 A = 0; T = 1; G = 2; C = 3

rawdata = \

['01 0.00 3.67 0.00 0.00',
 
'02 0.00 0.00 3.67 0.00',
 
'03 0.00 0.00 0.00 3.67',
 
'04 0.00 3.67 0.00 0.00',
 
'05 3.67 0.00 0.00 0.00',
 
'06 3.46 0.00 0.22 0.00',
 
'07 0.00 0.00 3.67 0.00',
 
'08 0.00 0.00 0.00 3.67',
 
'09 0.00 0.00 0.00 3.67',
 
'10 0.00 3.67 0.00 0.00',
 
'11 3.67 0.00 0.00 0.00',
 
'12 3.67 0.00 0.00 0.00',
 
'13 0.00 0.00 3.67 0.00',
 
'14 0.00 0.00 0.00 3.67',
 
'15 0.00 0.00 3.67 0.00']
 
datadictionary = {} # usually shorten the name to dd
 
for line in rawdata:

    items = line.split()

    key = items[0]

    datadictionary[key] = [float(item) for item in items[1:]]
 
print datadictionary['09'][C]

Jul 6 '07 #8

bvdet

2,851

Expert Mod 2GB

THis is my file! AS u can see the first column gives the position,the second third ..etc are the part of the matrix.my problem here is that.I have to write a code to access the individual values.like if i say A[01] i should have the value 0.00 or say A[06] i should have a value 3.46. or C[04]=3.67 etc. and i have to add one to each of these elements.can you gimme the code. my code is not working.

NA bap

PO A C G T

01 0.00 3.67 0.00 0.00

02 0.00 0.00 3.67 0.00

03 0.00 0.00 0.00 3.67

04 0.00 3.67 0.00 0.00

05 3.67 0.00 0.00 0.00

06 3.46 0.00 0.22 0.00

07 0.00 0.00 3.67 0.00

08 0.00 0.00 0.00 3.67

09 0.00 0.00 0.00 3.67

10 0.00 3.67 0.00 0.00

11 3.67 0.00 0.00 0.00

12 3.67 0.00 0.00 0.00

13 0.00 0.00 3.67 0.00

14 0.00 0.00 0.00 3.67

15 0.00 0.00 3.67 0.00

16 0.00 3.67 0.00 0.00

//

//

NA bcd

For simplicity, let us assume the data file looks like this:

PO A C G T
01 0.00 3.67 0.00 0.00
02 0.00 0.00 3.67 0.00
03 0.00 0.00 0.00 3.67
04 0.00 3.67 0.00 0.00
05 3.67 0.00 0.00 0.00
06 3.46 0.00 0.22 0.00
07 0.00 0.00 3.67 0.00
08 0.00 0.00 0.00 3.67
09 0.00 0.00 0.00 3.67
10 0.00 3.67 0.00 0.00
11 3.67 0.00 0.00 0.00
12 3.67 0.00 0.00 0.00
13 0.00 0.00 3.67 0.00
14 0.00 0.00 0.00 3.67
15 0.00 0.00 3.67 0.00
16 0.00 3.67 0.00 0.00

Create a dictionary of dictionaries:

Expand|Select|Wrap|Line Numbers

 fn = r'H:\TEMP\temsys\data7.txt'

lineList = [line.strip().split() for line in open(fn).readlines() if line != '\n']
 
headerList = lineList.pop(0)[1:]
 
# Key list

keys = [i[0] for i in lineList]

# Values list

values = [[float(s) for s in item] for item in [j[1:] for j in lineList]]
 
# Create a dictionary from keys and values

lineDict = dict(zip(keys, values))
 
dataDict = {}
 
for i, item in enumerate(headerList):

    dataDict[item] = {}

    for key in lineDict:

        dataDict[item][key] = lineDict[key][i]

Expand|Select|Wrap|Line Numbers

 >>> dataDict['A']['05']

3.6699999999999999

>>> globals().update(dataDict)

>>> A['05']

3.6699999999999999

>>>

Jul 6 '07 #9

aboxylica

111

100+

hey,
I don really understand what this line does.can u please explain?
values = [[float(s) for s in item] for item in [j[1:] for j in lineList]]
and

for i, item in enumerate(headerList):
dataDict[item] = {}
for key in lineDict:
dataDict[item][key] = lineDict[key]
I dont understand this either.
for this code,I am getting an error which says
value error:invalid literal for float():A

For simplicity, let us assume the data file looks like this:

PO A C G T
01 0.00 3.67 0.00 0.00
02 0.00 0.00 3.67 0.00
03 0.00 0.00 0.00 3.67
04 0.00 3.67 0.00 0.00
05 3.67 0.00 0.00 0.00
06 3.46 0.00 0.22 0.00
07 0.00 0.00 3.67 0.00
08 0.00 0.00 0.00 3.67
09 0.00 0.00 0.00 3.67
10 0.00 3.67 0.00 0.00
11 3.67 0.00 0.00 0.00
12 3.67 0.00 0.00 0.00
13 0.00 0.00 3.67 0.00
14 0.00 0.00 0.00 3.67
15 0.00 0.00 3.67 0.00
16 0.00 3.67 0.00 0.00

Create a dictionary of dictionaries:

Expand|Select|Wrap|Line Numbers

fn = r'H:\TEMP\temsys\data7.txt'

lineList = [line.strip().split() for line in open(fn).readlines() if line != '\n']

headerList = lineList.pop(0)[1:]

# Key list

keys = [i[0] for i in lineList]

# Values list

values = [[float(s) for s in item] for item in [j[1:] for j in lineList]]

# Create a dictionary from keys and values

lineDict = dict(zip(keys, values))

dataDict = {}

for i, item in enumerate(headerList):

    dataDict[item] = {}

    for key in lineDict:

        dataDict[item][key] = lineDict[key][i]

Expand|Select|Wrap|Line Numbers

>>> dataDict['A']['05']

3.6699999999999999

>>> globals().update(dataDict)

>>> A['05']

3.6699999999999999

>>>

Jul 7 '07 #10

bvdet

2,851

Expert Mod 2GB

hey,
I don really understand what this line does.can u please explain?
values = [[float(s) for s in item] for item in [j[1:] for j in lineList]]
and

for i, item in enumerate(headerList):
dataDict[item] = {}
for key in lineDict:
dataDict[item][key] = lineDict[key]
I dont understand this either.
for this code,I am getting an error which says
value error:invalid literal for float():A

The first question - The code is a list comprehension. Another way to write it would be:

Expand|Select|Wrap|Line Numbers

 values = []

for line in lineList:

    line = line[1:]

    tem = []

    for item in line:

        tem.append(float(item))

    values.append(tem)

The result:

Expand|Select|Wrap|Line Numbers

 >>> [[0.0, 3.6699999999999999, 0.0, 0.0], [0.0, 0.0, 3.6699999999999999, 0.0], [0.0, 0.0, 0.0, 3.6699999999999999], [0.0, 3.6699999999999999, 0.0, 0.0], [3.6699999999999999, 0.0, 0.0, 0.0], [3.46, 0.0, 0.22, 0.0], [0.0, 0.0, 3.6699999999999999, 0.0], [0.0, 0.0, 0.0, 3.6699999999999999], [0.0, 0.0, 0.0, 3.6699999999999999], [0.0, 3.6699999999999999, 0.0, 0.0], [3.6699999999999999, 0.0, 0.0, 0.0], [3.6699999999999999, 0.0, 0.0, 0.0], [0.0, 0.0, 3.6699999999999999, 0.0], [0.0, 0.0, 0.0, 3.6699999999999999], [0.0, 0.0, 3.6699999999999999, 0.0], [0.0, 3.6699999999999999, 0.0, 0.0]]
 

Jul 7 '07 #11

bvdet

2,851

Expert Mod 2GB

hey,
I don really understand what this line does.can u please explain?
values = [[float(s) for s in item] for item in [j[1:] for j in lineList]]
and

for i, item in enumerate(headerList):
dataDict[item] = {}
for key in lineDict:
dataDict[item][key] = lineDict[key]
I dont understand this either.
for this code,I am getting an error which says
value error:invalid literal for float():A

The second question - This is 'headerList':

Expand|Select|Wrap|Line Numbers

 >>> headerList

['A', 'C', 'G', 'T']

>>>

The values in 'headerList' will be 'keys' in 'dataDict'. 'dataDict' will be the main dictionary, and the values will be subdictionaries. A dictionary key is associated with a value - in this case the value will be another dictionary. Variable 'keys' contain the subdictionary keys:

Expand|Select|Wrap|Line Numbers

 >>> keys

['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16']

>>>

'lineDict' is a temporary dictionary created to make it easier to compile the data in the form you wanted:

Expand|Select|Wrap|Line Numbers

 >>> lineDict

{'02': [0.0, 0.0, 3.6699999999999999, 0.0], '03': [0.0, 0.0, 0.0, 3.6699999999999999], '13': [0.0, 0.0, 3.6699999999999999, 0.0], '01': [0.0, 3.6699999999999999, 0.0, 0.0], '06': [3.46, 0.0, 0.22, 0.0], '07': [0.0, 0.0, 3.6699999999999999, 0.0], '04': [0.0, 3.6699999999999999, 0.0, 0.0], '05': [3.6699999999999999, 0.0, 0.0, 0.0], '08': [0.0, 0.0, 0.0, 3.6699999999999999], '09': [0.0, 0.0, 0.0, 3.6699999999999999], '16': [0.0, 3.6699999999999999, 0.0, 0.0], '12': [3.6699999999999999, 0.0, 0.0, 0.0], '14': [0.0, 0.0, 0.0, 3.6699999999999999], '11': [3.6699999999999999, 0.0, 0.0, 0.0], '15': [0.0, 0.0, 3.6699999999999999, 0.0], '10': [0.0, 3.6699999999999999, 0.0, 0.0]}

>>>

Using 'enumerate' on 'headerList', Python gives me these values:

Expand|Select|Wrap|Line Numbers

 >>> for i, item in enumerate(headerList):

...     print i, item

...     

0 A

1 C

2 G

3 T

>>>

Here's an interactive example showing what is happening inside the nested for loop:

Expand|Select|Wrap|Line Numbers

 >>> dataDict

{}

>>> key

'10'

>>> item

'T'

>>> lineDict

{'02': [0.0, 0.0, 3.6699999999999999, 0.0], '03': [0.0, 0.0, 0.0, 3.6699999999999999], '13': [0.0, 0.0, 3.6699999999999999, 0.0], '01': [0.0, 3.6699999999999999, 0.0, 0.0], '06': [3.46, 0.0, 0.22, 0.0], '07': [0.0, 0.0, 3.6699999999999999, 0.0], '04': [0.0, 3.6699999999999999, 0.0, 0.0], '05': [3.6699999999999999, 0.0, 0.0, 0.0], '08': [0.0, 0.0, 0.0, 3.6699999999999999], '09': [0.0, 0.0, 0.0, 3.6699999999999999], '16': [0.0, 3.6699999999999999, 0.0, 0.0], '12': [3.6699999999999999, 0.0, 0.0, 0.0], '14': [0.0, 0.0, 0.0, 3.6699999999999999], '11': [3.6699999999999999, 0.0, 0.0, 0.0], '15': [0.0, 0.0, 3.6699999999999999, 0.0], '10': [0.0, 3.6699999999999999, 0.0, 0.0]}

>>> lineDict[key][3]

0.0

>>> dataDict[item] = {}

>>> dataDict[item][key] = lineDict[key][2]

>>> dataDict

{'T': {'10': 0.0}}

>>>

I hope this helps you understand what is happening.

Jul 7 '07 #12

bvdet

2,851

Expert Mod 2GB

for this code,I am getting an error which says
value error:invalid literal for float():A

Check variable 'lineList'. It should look like this:

Expand|Select|Wrap|Line Numbers

 >>> for line in lineList:

...     print line

...     

['01', '0.00', '3.67', '0.00', '0.00']

['02', '0.00', '0.00', '3.67', '0.00']

['03', '0.00', '0.00', '0.00', '3.67']

['04', '0.00', '3.67', '0.00', '0.00']

['05', '3.67', '0.00', '0.00', '0.00']

['06', '3.46', '0.00', '0.22', '0.00']

['07', '0.00', '0.00', '3.67', '0.00']

['08', '0.00', '0.00', '0.00', '3.67']

['09', '0.00', '0.00', '0.00', '3.67']

['10', '0.00', '3.67', '0.00', '0.00']

['11', '3.67', '0.00', '0.00', '0.00']

['12', '3.67', '0.00', '0.00', '0.00']

['13', '0.00', '0.00', '3.67', '0.00']

['14', '0.00', '0.00', '0.00', '3.67']

['15', '0.00', '0.00', '3.67', '0.00']

['16', '0.00', '3.67', '0.00', '0.00']

>>>

Jul 7 '07 #13

aboxylica

111

100+

The header list is supposed to take the first four letters right? like A,T,G,C.it is not happening
and my linelist has the entire file(I mean with the A,T,G,C)

Check variable 'lineList'. It should look like this:

Expand|Select|Wrap|Line Numbers

>>> for line in lineList:

... print line

...

['01', '0.00', '3.67', '0.00', '0.00']

['02', '0.00', '0.00', '3.67', '0.00']

['03', '0.00', '0.00', '0.00', '3.67']

['04', '0.00', '3.67', '0.00', '0.00']

['05', '3.67', '0.00', '0.00', '0.00']

['06', '3.46', '0.00', '0.22', '0.00']

['07', '0.00', '0.00', '3.67', '0.00']

['08', '0.00', '0.00', '0.00', '3.67']

['09', '0.00', '0.00', '0.00', '3.67']

['10', '0.00', '3.67', '0.00', '0.00']

['11', '3.67', '0.00', '0.00', '0.00']

['12', '3.67', '0.00', '0.00', '0.00']

['13', '0.00', '0.00', '3.67', '0.00']

['14', '0.00', '0.00', '0.00', '3.67']

['15', '0.00', '0.00', '3.67', '0.00']

['16', '0.00', '3.67', '0.00', '0.00']

>>>

Jul 8 '07 #14

aboxylica

111

100+

fn = 'half.txt'
fn_=open("half.txt","r")
file_content=fn_.readlines()

for line in file_content:
linelist=line.strip().split()
print linelist

headerList = linelist.pop(0)[1:]
print headerList

when i do this my code has a headerlist which has
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
WHY IS THIS HAppening?
and what does the strip( ) do?

Jul 8 '07 #15

aboxylica

111

100+

headerList = linelist.pop(0)[1:]
WHAT DOES THIS LINE DO?

Check variable 'lineList'. It should look like this:

Expand|Select|Wrap|Line Numbers

>>> for line in lineList:

... print line

...

['01', '0.00', '3.67', '0.00', '0.00']

['02', '0.00', '0.00', '3.67', '0.00']

['03', '0.00', '0.00', '0.00', '3.67']

['04', '0.00', '3.67', '0.00', '0.00']

['05', '3.67', '0.00', '0.00', '0.00']

['06', '3.46', '0.00', '0.22', '0.00']

['07', '0.00', '0.00', '3.67', '0.00']

['08', '0.00', '0.00', '0.00', '3.67']

['09', '0.00', '0.00', '0.00', '3.67']

['10', '0.00', '3.67', '0.00', '0.00']

['11', '3.67', '0.00', '0.00', '0.00']

['12', '3.67', '0.00', '0.00', '0.00']

['13', '0.00', '0.00', '3.67', '0.00']

['14', '0.00', '0.00', '0.00', '3.67']

['15', '0.00', '0.00', '3.67', '0.00']

['16', '0.00', '3.67', '0.00', '0.00']

>>>

Jul 8 '07 #16

bartonc

6,596

Expert 4TB

<snip>what does the strip( ) do?

strip() removes whitespace (or any characters you tell it to) from BOTH ends of a string. Like this:

Expand|Select|Wrap|Line Numbers

 
>>> s = "   had_whitespace   "

>>> t = s.strip()

>>> print repr(t)  # repr() prints the string representation of a varialbe 

'had_whitespace'

>>>

Jul 8 '07 #17

aboxylica

111

100+

headerList = linelist.pop(0)[1:]
WHAT DOES THIS LINE DO?

strip() removes whitespace (or any characters you tell it to) from BOTH ends of a string. Like this:

Expand|Select|Wrap|Line Numbers

>>> s = " had_whitespace "

>>> t = s.strip()

>>> print repr(t) # repr() prints the string representation of a varialbe

'had_whitespace'

>>>

Jul 8 '07 #18

bartonc

6,596

Expert 4TB

headerList = linelist.pop(0)[1:]
WHAT DOES THIS LINE DO?

Since I didn't write that, I'll just have to do my best to describe it:
Pop(0) takes the zeroth element from linelist (which looks like another list). The result is stored in an unseen, temporary variable. [1:] takes all but the zeroth element from that temporary variable and that (shortened list) is stored in headerList. You can check it out by doing something like

Expand|Select|Wrap|Line Numbers

 temp = linelist.pop(0)

print temp

headerList = temp[1:]   # this is called a "slice" from the temp list.

print headerList

Hope that helps.

Jul 8 '07 #19

aboxylica

111

100+

thanks for that . but could you give ur mail id..because I am not able to paste my exact file here and am stuck with this problem.plz help!

Since I didn't write that, I'll just have to do my best to describe it:
Pop(0) takes the zeroth element from linelist (which looks like another list). The result is stored in an unseen, temporary variable. [1:] takes all but the zeroth element from that temporary variable and that (shortened list) is stored in headerList. You can check it out by doing something like

Expand|Select|Wrap|Line Numbers

temp = linelist.pop(0)

print temp

headerList = temp[1:] # this is called a "slice" from the temp list.

print headerList

Hope that helps.

Jul 8 '07 #20

bartonc

6,596

Expert 4TB

thanks for that . but could you give ur mail id..because I am not able to paste my exact file here and am stuck with this problem.plz help!

Paste what you can here and I'll format it for you.

Jul 8 '07 #21

aboxylica

111

100+

Paste what you can here and I'll format it for you.

NA bap
PO A C G T
01 0.00 3.67 0.00 0.00
02 0.00 0.00 3.67 0.00
03 0.00 0.00 0.00 3.67
04 0.00 3.67 0.00 0.00
05 3.67 0.00 0.00 0.00
06 3.46 0.00 0.22 0.00
07 0.00 0.00 3.67 0.00
08 0.00 0.00 0.00 3.67
09 0.00 0.00 0.00 3.67
10 0.00 3.67 0.00 0.00
11 3.67 0.00 0.00 0.00
12 3.67 0.00 0.00 0.00
13 0.00 0.00 3.67 0.00
14 0.00 0.00 0.00 3.67
15 0.00 0.00 3.67 0.00
16 0.00 3.67 0.00 0.00
//
//
NA bcd
PO A C G T
01 42.55 8.75 145.86 8.14
02 0.14 0.53 204.64 0.00
03 126.83 78.02 0.11 0.34
04 0.21 0.17 0.00 204.92
05 0.00 12.38 0.43 192.50
06 174.48 0.95 1.32 28.56
07 79.53 4.70 100.44 20.64
//
//
NA bin
PO A C G T
01 0.45 8.27 0.00 11.39
02 0.00 0.00 10.02 10.09
03 5.80 1.39 0.00 12.93
04 12.33 5.18 2.60 0.00
05 12.43 0.00 0.00 7.68
06 18.55 0.00 1.57 0.00
07 0.05 0.58 0.00 19.48
08 20.11 0.00 0.00 0.00
09 20.06 0.05 0.00 0.00
10 20.11 0.00 0.00 0.00
11 0.00 15.33 0.00 4.78
12 20.06 0.05 0.00 0.00
13 14.99 0.35 4.78 0.00
14 13.64 2.42 3.37 0.68
15 5.03 0.00 15.08 0.00
16 7.23 0.45 10.94 1.49
//
//
I want to add one to each and every element of the file(of course except the first column giving the position)
and want to access the elements like A[01]=,c[02]=,etc..

Jul 8 '07 #22

bartonc

6,596

Expert 4TB

NA bap
PO A C G T
01 0.00 3.67 0.00 0.00
02 0.00 0.00 3.67 0.00
03 0.00 0.00 0.00 3.67
04 0.00 3.67 0.00 0.00
05 3.67 0.00 0.00 0.00
06 3.46 0.00 0.22 0.00
07 0.00 0.00 3.67 0.00
08 0.00 0.00 0.00 3.67
09 0.00 0.00 0.00 3.67
10 0.00 3.67 0.00 0.00
11 3.67 0.00 0.00 0.00
12 3.67 0.00 0.00 0.00
13 0.00 0.00 3.67 0.00
14 0.00 0.00 0.00 3.67
15 0.00 0.00 3.67 0.00
16 0.00 3.67 0.00 0.00
//
//
NA bcd
PO A C G T
01 42.55 8.75 145.86 8.14
02 0.14 0.53 204.64 0.00
03 126.83 78.02 0.11 0.34
04 0.21 0.17 0.00 204.92
05 0.00 12.38 0.43 192.50
06 174.48 0.95 1.32 28.56
07 79.53 4.70 100.44 20.64
//
//
NA bin
PO A C G T
01 0.45 8.27 0.00 11.39
02 0.00 0.00 10.02 10.09
03 5.80 1.39 0.00 12.93
04 12.33 5.18 2.60 0.00
05 12.43 0.00 0.00 7.68
06 18.55 0.00 1.57 0.00
07 0.05 0.58 0.00 19.48
08 20.11 0.00 0.00 0.00
09 20.06 0.05 0.00 0.00
10 20.11 0.00 0.00 0.00
11 0.00 15.33 0.00 4.78
12 20.06 0.05 0.00 0.00
13 14.99 0.35 4.78 0.00
14 13.64 2.42 3.37 0.68
15 5.03 0.00 15.08 0.00
16 7.23 0.45 10.94 1.49
//
//
I want to add one to each and every element of the file(of course except the first column giving the position)
and want to access the elements like A[01]=,c[02]=,etc..

Lacking your working program pasted here, I'm afraid that the best I can tell you is it should be as simple as:

Expand|Select|Wrap|Line Numbers

 temp = A[01]

print temp

A[01] = temp + 1

print A[01]

or even

Expand|Select|Wrap|Line Numbers

 print A[01]

A[01] += 1

print A[01]

in a loop through your data structure.

Jul 8 '07 #23

aboxylica

111

100+

Expand|Select|Wrap|Line Numbers

 
file_=open("half1.txt","r")

file_content=file_.readlines()

linelist=[line.strip().split() for line in file_content if line != '\n']
 
headerlist=linelist.pop(0)[1:]
 
keys=[i[0] for i in linelist]
 
values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
 
linedict =dict(zip(keys,values))
 
datadict={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    for key in linedict:

        print key

        datadict[item][key]=linedict[key][i]

        print datadict['A']['01']

when i execute i get an error that says
keyerror: '01'

<removed quote=bvdet to make this post appear.>

'headerList':

Expand|Select|Wrap|Line Numbers

 >>> headerList

['A', 'C', 'G', 'T']

>>>

Expand|Select|Wrap|Line Numbers

 >>> keys

['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16']

>>>

'lineDict' is a temporary dictionary created to make it easier to compile the data in the form you wanted:

Expand|Select|Wrap|Line Numbers

 >>> lineDict

{'02': [0.0, 0.0, 3.6699999999999999, 0.0], '03': [0.0, 0.0, 0.0, 3.6699999999999999], '13': [0.0, 0.0, 3.6699999999999999, 0.0], '01': [0.0, 3.6699999999999999, 0.0, 0.0], '06': [3.46, 0.0, 0.22, 0.0], '07': [0.0, 0.0, 3.6699999999999999, 0.0], '04': [0.0, 3.6699999999999999, 0.0, 0.0], '05': [3.6699999999999999, 0.0, 0.0, 0.0], '08': [0.0, 0.0, 0.0, 3.6699999999999999], '09': [0.0, 0.0, 0.0, 3.6699999999999999], '16': [0.0, 3.6699999999999999, 0.0, 0.0], '12': [3.6699999999999999, 0.0, 0.0, 0.0], '14': [0.0, 0.0, 0.0, 3.6699999999999999], '11': [3.6699999999999999, 0.0, 0.0, 0.0], '15': [0.0, 0.0, 3.6699999999999999, 0.0], '10': [0.0, 3.6699999999999999, 0.0, 0.0]}

>>>

Using 'enumerate' on 'headerList', Python gives me these values:

Expand|Select|Wrap|Line Numbers

 >>> for i, item in enumerate(headerList):

...     print i, item

...     

0 A

1 C

2 G

3 T

>>>

Here's an interactive example showing what is happening inside the nested for loop:

Expand|Select|Wrap|Line Numbers

 >>> dataDict

{}

>>> key

'10'

>>> item

'T'

>>> lineDict

{'02': [0.0, 0.0, 3.6699999999999999, 0.0], '03': [0.0, 0.0, 0.0, 3.6699999999999999], '13': [0.0, 0.0, 3.6699999999999999, 0.0], '01': [0.0, 3.6699999999999999, 0.0, 0.0], '06': [3.46, 0.0, 0.22, 0.0], '07': [0.0, 0.0, 3.6699999999999999, 0.0], '04': [0.0, 3.6699999999999999, 0.0, 0.0], '05': [3.6699999999999999, 0.0, 0.0, 0.0], '08': [0.0, 0.0, 0.0, 3.6699999999999999], '09': [0.0, 0.0, 0.0, 3.6699999999999999], '16': [0.0, 3.6699999999999999, 0.0, 0.0], '12': [3.6699999999999999, 0.0, 0.0, 0.0], '14': [0.0, 0.0, 0.0, 3.6699999999999999], '11': [3.6699999999999999, 0.0, 0.0, 0.0], '15': [0.0, 0.0, 3.6699999999999999, 0.0], '10': [0.0, 3.6699999999999999, 0.0, 0.0]}

>>> lineDict[key][3]

0.0

>>> dataDict[item] = {}

>>> dataDict[item][key] = lineDict[key][2]

>>> dataDict

{'T': {'10': 0.0}}

>>>

I hope this helps you understand what is happening.

Jul 8 '07 #24

bartonc

6,596

Expert 4TB

Expand|Select|Wrap|Line Numbers

file_=open("half1.txt","r")

file_content=file_.readlines()

linelist=[line.strip().split() for line in file_content if line != '\n']

headerlist=linelist.pop(0)[1:]

keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]

linedict =dict(zip(keys,values))

datadict={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    for key in linedict:

        print key

        datadict[item][key]=linedict[key][i]

        print datadict['A']['01']

when i execute i get an error that says
keyerror: '01'

Expand|Select|Wrap|Line Numbers

 
file_=open("half1.txt","r")

file_content=file_.readlines()

linelist=[line.strip().split() for line in file_content if line != '\n']
 
headerlist=linelist.pop(0)[1:]
 
keys=[i[0] for i in linelist]
 
values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
 
linedict =dict(zip(keys,values))
 
# Form the data dictionary first:

datadict={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    for key in linedict:

        datadict[item][key]=linedict[key][i]
 
# Then loop through the data structure. But since you have turned

# the data sideways, I can not see the structure at this moment.

Jul 8 '07 #25

aboxylica

111

100+

Expand|Select|Wrap|Line Numbers

 file_=open("half1.txt","r")

file_content=file_.readlines()

linelist=[line.strip().split() for line in file_content if line != '\n']
 
headerlist=linelist.pop(0)[1:]

print headerlist
 
keys=[i[0] for i in linelist]
 
values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
 
linedict =dict(zip(keys,values))
 
datadict={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    for key in linedict:

        print key

        datadict[item][key]=linedict[key][i]

        print datadict['A']['01']

this is my program. i am getting an error that says
key error:'01' and i do want to add one to all the elements
(i mean the float values)
can you see the error?

Jul 8 '07 #26

bartonc

6,596

Expert 4TB

Expand|Select|Wrap|Line Numbers

file_=open("half1.txt","r")

file_content=file_.readlines()

linelist=[line.strip().split() for line in file_content if line != '\n']

headerlist=linelist.pop(0)[1:]

print headerlist

keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]

linedict =dict(zip(keys,values))

datadict={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    for key in linedict:

        print key

        datadict[item][key]=linedict[key][i]

        print datadict['A']['01']

this is my program. i am getting an error that says
key error:'01' and i do want to add one to all the elements
(i mean the float values)
can you see the error?

I've answered above.

You must learn to use code tags. Instructions are on the right hand side of the page when you are making your post (or reply). There is also very much helpful information in the How to ask a question section of our Posting Guidelines. Thanks

Jul 8 '07 #27

bvdet

2,851

Expert Mod 2GB

Lacking your working program pasted here, I'm afraid that the best I can tell you is it should be as simple as:

Expand|Select|Wrap|Line Numbers

temp = A[01]

print temp

A[01] = temp + 1

print A[01]

or even

Expand|Select|Wrap|Line Numbers

print A[01]

A[01] += 1

print A[01]

in a loop through your data structure.

It looks like you have three data sets in the file. Which one do you want to work on?

Notice that I simplified my example code by eliminating some of the lines in the data file. It appears that you have not adjusted the code to account for that.

Jul 8 '07 #28

bvdet

2,851

Expert Mod 2GB

I have modified the first part of my example code to read the first data set in the OP file:

Expand|Select|Wrap|Line Numbers

 fn = r'H:\TEMP\temsys\data9.txt'

f = open(fn)
 
line = f.next()

while not line.startswith('PO'):

    line = f.next()
 
headerList = line.strip().split()[1:]

lineList = []
 
line = f.next().strip()

while not line.startswith('/'):

    if line != '':

        lineList.append(line.strip().split())

    line = f.next().strip()
 
f.close()

This will add a set amount to every element in the data:

Expand|Select|Wrap|Line Numbers

 # Add 1.0 to every element in dataDict subdictionaries

for keyMain in dataDict:

    for keySub in dataDict[keyMain]:

        dataDict[keyMain][keySub] += 1.0

Expand|Select|Wrap|Line Numbers

 >>> dataDict

{'A': {'02': 1.0, '03': 1.0, '13': 1.0, '01': 1.0, '06': 4.46, '07': 1.0, '04': 1.0, '05': 4.6699999999999999, '08': 1.0, '09': 1.0, '10': 1.0, '16': 1.0, '11': 4.6699999999999999, '15': 1.0, '12': 4.6699999999999999, '14': 1.0}, 'C': {'02': 1.0, '03': 1.0, '13': 1.0, '01': 4.6699999999999999, '06': 1.0, '07': 1.0, '04': 4.6699999999999999, '05': 1.0, '08': 1.0, '09': 1.0, '10': 4.6699999999999999, '16': 4.6699999999999999, '11': 1.0, '15': 1.0, '12': 1.0, '14': 1.0}, 'T': {'02': 1.0, '03': 4.6699999999999999, '13': 1.0, '01': 1.0, '06': 1.0, '07': 1.0, '04': 1.0, '05': 1.0, '08': 4.6699999999999999, '09': 4.6699999999999999, '10': 1.0, '16': 1.0, '11': 1.0, '15': 1.0, '12': 1.0, '14': 4.6699999999999999}, 'G': {'02': 4.6699999999999999, '03': 1.0, '13': 4.6699999999999999, '01': 1.0, '06': 1.22, '07': 4.6699999999999999, '04': 1.0, '05': 1.0, '08': 1.0, '09': 1.0, '10': 1.0, '16': 1.0, '11': 1.0, '15': 4.6699999999999999, '12': 1.0, '14': 1.0}}

>>>

Jul 8 '07 #29

aboxylica

111

100+

Expand|Select|Wrap|Line Numbers

 
f=open("weight_matrix.transfac.txt","r")

line=f.next()

while not line.startswith('PO'):

    line=f.next()
 
headerlist=line.strip().split()[1:]

linelist=[]
 
line=f.next().strip()

while not line.startswith('/'):

    if line != '':

        linelist.append(line.strip().split())

    line=f.next().strip()
 
keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
 
linedict=dict(zip(keys,values))

datadict={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    for key in linedict:

        datadict[item][key]=linedict[key][i]

        for keymain in datadict:

            for keysub in datadict[keymain]:

                datadict[keymain][keysub]+=1.0

                print datadict

so here is the code that you suggested for creating dictionaries for a file(matrix)
now i have gotta do something like..
supposing the sequence entered is something like "ATATTA".. so A is in the first position so A[01]*T[02]*A[03]*T[04]*T[05]*A[06]=??
the sequence is going to be entered by the user everytime(so it will keep changing)
how do i do this?? what changes should i do??hope I am clear!!
THIS is the file containing the matrix
NA bap
PO A C G T
01 0.00 3.67 0.00 0.00
02 0.00 0.00 3.67 0.00
03 0.00 0.00 0.00 3.67
04 0.00 3.67 0.00 0.00
05 3.67 0.00 0.00 0.00
06 3.46 0.00 0.22 0.00
07 0.00 0.00 3.67 0.00
08 0.00 0.00 0.00 3.67
09 0.00 0.00 0.00 3.67
10 0.00 3.67 0.00 0.00
11 3.67 0.00 0.00 0.00
12 3.67 0.00 0.00 0.00
13 0.00 0.00 3.67 0.00
14 0.00 0.00 0.00 3.67
15 0.00 0.00 3.67 0.00
16 0.00 3.67 0.00 0.00
//
//
NA bcd
PO A C G T
01 42.55 8.75 145.86 8.14
02 0.14 0.53 204.64 0.00
03 126.83 78.02 0.11 0.34
04 0.21 0.17 0.00 204.92
05 0.00 12.38 0.43 192.50
06 174.48 0.95 1.32 28.56
07 79.53 4.70 100.44 20.64
//
//
NA bin
PO A C G T
01 0.45 8.27 0.00 11.39
02 0.00 0.00 10.02 10.09
03 5.80 1.39 0.00 12.93
04 12.33 5.18 2.60 0.00
05 12.43 0.00 0.00 7.68
06 18.55 0.00 1.57 0.00
07 0.05 0.58 0.00 19.48
08 20.11 0.00 0.00 0.00
09 20.06 0.05 0.00 0.00
10 20.11 0.00 0.00 0.00
11 0.00 15.33 0.00 4.78
12 20.06 0.05 0.00 0.00
13 14.99 0.35 4.78 0.00
14 13.64 2.42 3.37 0.68
15 5.03 0.00 15.08 0.00
16 7.23 0.45 10.94 1.49
//
//

Jul 9 '07 #30

aboxylica

111

100+

Expand|Select|Wrap|Line Numbers

 
f=open("weight_matrix.transfac.txt","r")

line=f.next()

while not line.startswith('PO'):

    line=f.next()
 
headerlist=line.strip().split()[1:]

linelist=[]
 
line=f.next().strip()

while not line.startswith('/'):

    if line != '':

        linelist.append(line.strip().split())

    line=f.next().strip()
 
keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
 
linedict=dict(zip(keys,values))

datadict={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    for key in linedict:

        datadict[item][key]=linedict[key][i]

        for keymain in datadict:

            for keysub in datadict[keymain]:

                datadict[keymain][keysub]+=1.0

                print datadict

so here is the code that you suggested for creating dictionaries for a file(matrix)
now i have gotta do something like..
supposing the sequence entered is something like "ATATTA".. so A is in the first position so A[01]*T[02]*A[03]*T[04]*T[05]*A[06]=??
how do i do this?? what changes should i do??
THIS is the file containing the matrix
NA bap
PO A C G T
01 0.00 3.67 0.00 0.00
02 0.00 0.00 3.67 0.00
03 0.00 0.00 0.00 3.67
04 0.00 3.67 0.00 0.00
05 3.67 0.00 0.00 0.00
06 3.46 0.00 0.22 0.00
07 0.00 0.00 3.67 0.00
08 0.00 0.00 0.00 3.67
09 0.00 0.00 0.00 3.67
10 0.00 3.67 0.00 0.00
11 3.67 0.00 0.00 0.00
12 3.67 0.00 0.00 0.00
13 0.00 0.00 3.67 0.00
14 0.00 0.00 0.00 3.67
15 0.00 0.00 3.67 0.00
16 0.00 3.67 0.00 0.00
//
//
NA bcd
PO A C G T
01 42.55 8.75 145.86 8.14
02 0.14 0.53 204.64 0.00
03 126.83 78.02 0.11 0.34
04 0.21 0.17 0.00 204.92
05 0.00 12.38 0.43 192.50
06 174.48 0.95 1.32 28.56
07 79.53 4.70 100.44 20.64
//
//
NA bin
PO A C G T
01 0.45 8.27 0.00 11.39
02 0.00 0.00 10.02 10.09
03 5.80 1.39 0.00 12.93
04 12.33 5.18 2.60 0.00
05 12.43 0.00 0.00 7.68
06 18.55 0.00 1.57 0.00
07 0.05 0.58 0.00 19.48
08 20.11 0.00 0.00 0.00
09 20.06 0.05 0.00 0.00
10 20.11 0.00 0.00 0.00
11 0.00 15.33 0.00 4.78
12 20.06 0.05 0.00 0.00
13 14.99 0.35 4.78 0.00
14 13.64 2.42 3.37 0.68
15 5.03 0.00 15.08 0.00
16 7.23 0.45 10.94 1.49
//
//

Jul 9 '07 #31

bartonc

6,596

Expert 4TB

HERE IS MY code!!plz help how to proceed??

Thanks for using code tags.
First thing to do is try it. It looks like it should work.

Right after that, READ the rest of the POSTING GUIDELINES. You have committed several infractions and must consider yourself warned.

Jul 9 '07 #32

aboxylica

111

100+

I am sorry about that. I will follow the rules hence forth.but i dont know how to proceed with my problem further.

Jul 9 '07 #33

bartonc

6,596

Expert 4TB

I am sorry about that. I will follow the rules hence forth.but i dont know how to proceed with my problem further.

Ok. All is forgiven. Here's the thing. I can't give you working code.
I can give you examples of what I see going on here so that you can try things on your own. It looks to me like you could multiply the specified elements (after your matrix has been created) like this:

Expand|Select|Wrap|Line Numbers

 
>>> seq = "ATATTA"

>>> res = 1

>>> for i, key in enumerate(seq):

...     res *= dd[key]["%02d" %(i + 1)]

>>> print res

Jul 9 '07 #34

aboxylica

111

100+

Thanks a lot for all the help! I still have a small doubt.what does this line do?

Expand|Select|Wrap|Line Numbers

  
while not  line.startswith('PO'):
 
    line=f.next()

    print line

I mean this is supposed to refer to the line which doesnt start with "PO' right? when i say print line. It is printing
PO A C G T
but this is not what it means right?

Jul 9 '07 #35

bvdet

2,851

Expert Mod 2GB

Thanks a lot for all the help! I still have a small doubt.what does this line do?

Expand|Select|Wrap|Line Numbers

while not  line.startswith('PO'):

    line=f.next()

    print line

I mean this is supposed to refer to the line which doesnt start with "PO' right? when i say print line. It is printing
PO A C G T
but this is not what it means right?

f.next() merely advances one line in the file. The object is to advance to the first line that starts with "PO".

I asked you a question in an earlier post about your data. It looks like there are three data sets in your data file. The example code I posted only parses the first data set.

Jul 9 '07 #36

aboxylica

111

100+

okay,then why do we say,
while not?? it should be while..right??
and yes there are actually a lot of data sets in my file around 15 to twenty first am trying to make it work for one.then i will do it for the rest(sorry for not answering I am really going crazy with the program)i owe u an apology
And i have one more doubt:
[code]
datadict={}
for i,item in enumerate(headerlist):
datadict[item]={}
for key in linedict:
print key
# when i say a print key here, it is printing the keys twice and in an unorderly #fashion.why is this happening?
datadict[item][key]=linedict[key][i]
for keymain in datadict:
for keysub in datadict[keymain]:
datadict[keymain][keysub]+=1.0
print datadict
looking fwd for ur reply!
cheers!

Jul 9 '07 #37

bvdet

2,851

Expert Mod 2GB

okay,then why do we say,
while not?? it should be while..right??
and yes there are actually a lot of data sets in my file around 15 to twenty first am trying to make it work for one.then i will do it for the rest(sorry for not answering I am really going crazy with the program)i owe u an apology
And i have one more doubt:
[code]
datadict={}
for i,item in enumerate(headerlist):
datadict[item]={}
for key in linedict:
print key
# when i say a print key here, it is printing the keys twice and in an unorderly #fashion.why is this happening?
datadict[item][key]=linedict[key][i]
for keymain in datadict:
for keysub in datadict[keymain]:
datadict[keymain][keysub]+=1.0
print datadict
looking fwd for ur reply!
cheers!

I should be 'while not'. We want to skip the lines until the line starts with 'PO'. You can use this same method to advance into later data sets.

No problem about not answering.

Dictionaries are unordered collections of data. You can print in an orderly fashion like this:

Expand|Select|Wrap|Line Numbers

 keys = lineDict.keys()

keys.sort()

for key in keys:

    print '%s = %s' % (key, lineDict[key])

Jul 9 '07 #38

aboxylica

111

100+

my file:
my file:
NA bap
PO A C G T
01 0.00 3.67 0.00 0.00
02 0.00 0.00 3.67 0.00
03 0.00 0.00 0.00 3.67
04 0.00 3.67 0.00 0.00
05 3.67 0.00 0.00 0.00
06 3.46 0.00 0.22 0.00
07 0.00 0.00 3.67 0.00
08 0.00 0.00 0.00 3.67
09 0.00 0.00 0.00 3.67
10 0.00 3.67 0.00 0.00
11 3.67 0.00 0.00 0.00
12 3.67 0.00 0.00 0.00
13 0.00 0.00 3.67 0.00
14 0.00 0.00 0.00 3.67
15 0.00 0.00 3.67 0.00
16 0.00 3.67 0.00 0.00
//
//

This is my code:

Expand|Select|Wrap|Line Numbers

 
f=open("deeps1.txt","r")

line=f.next()

while not line.startswith('PO'):

    line=f.next()
 
headerlist=line.strip().split()[1:]

linelist=[]
 
line=f.next().strip()

while not line.startswith('/'):

    if line != '':

        linelist.append(line.strip().split())

    line=f.next().strip()
 
keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]

array={}

linedict=dict(zip(keys,values))

keys = linedict.keys()

keys.sort()

for key in keys:

    array=[key,linedict[key]]
 
datadict={}

datadict1={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    print datadict[item]#this one returns empty dictionary

    for key_ in linedict:

        print item# all items are getting printed

        datadict[item][key_]=linedict[key_][i]

        # but for the print statement below its saying key error:'C'

       print datadict['C']['01']

I have written the problems in comment form in the code!

can you execute my code and see why this is happening? I have been stuck with this for a day now.
waiting for your reply
cheers!

Jul 9 '07 #39

bartonc

6,596

Expert 4TB

my file:
my file:
NA bap
PO A C G T
01 0.00 3.67 0.00 0.00
02 0.00 0.00 3.67 0.00
03 0.00 0.00 0.00 3.67
04 0.00 3.67 0.00 0.00
05 3.67 0.00 0.00 0.00
06 3.46 0.00 0.22 0.00
07 0.00 0.00 3.67 0.00
08 0.00 0.00 0.00 3.67
09 0.00 0.00 0.00 3.67
10 0.00 3.67 0.00 0.00
11 3.67 0.00 0.00 0.00
12 3.67 0.00 0.00 0.00
13 0.00 0.00 3.67 0.00
14 0.00 0.00 0.00 3.67
15 0.00 0.00 3.67 0.00
16 0.00 3.67 0.00 0.00
//
//

This is my code:

Expand|Select|Wrap|Line Numbers

f=open("deeps1.txt","r")

line=f.next()

while not line.startswith('PO'):

    line=f.next()

headerlist=line.strip().split()[1:]

linelist=[]

line=f.next().strip()

while not line.startswith('/'):

    if line != '':

        linelist.append(line.strip().split())

    line=f.next().strip()

keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]

array={}

linedict=dict(zip(keys,values))

keys = linedict.keys()

keys.sort()

for key in keys:

    array=[key,linedict[key]]

datadict={}

datadict1={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    print datadict[item]#this one returns empty dictionary

    for key_ in linedict:

        print item# all items are getting printed

        datadict[item][key_]=linedict[key_][i]

        # but for the print statement below its saying key error:'C'

       print datadict['C']['01']

I have written the problems in comment form in the code!

can you execute my code and see why this is happening? I have been stuck with this for a day now.
waiting for your reply
cheers!

For that to work, you'll have to print every new addition to the datadict.

Expand|Select|Wrap|Line Numbers

 
        # print every item going into the datadict because 'C' has not been created yet

       print linedict[key_][i]

Jul 9 '07 #40

bvdet

2,851

Expert Mod 2GB

my file:
my file:
NA bap
PO A C G T
01 0.00 3.67 0.00 0.00
02 0.00 0.00 3.67 0.00
03 0.00 0.00 0.00 3.67
04 0.00 3.67 0.00 0.00
05 3.67 0.00 0.00 0.00
06 3.46 0.00 0.22 0.00
07 0.00 0.00 3.67 0.00
08 0.00 0.00 0.00 3.67
09 0.00 0.00 0.00 3.67
10 0.00 3.67 0.00 0.00
11 3.67 0.00 0.00 0.00
12 3.67 0.00 0.00 0.00
13 0.00 0.00 3.67 0.00
14 0.00 0.00 0.00 3.67
15 0.00 0.00 3.67 0.00
16 0.00 3.67 0.00 0.00
//
//

This is my code:

Expand|Select|Wrap|Line Numbers

f=open("deeps1.txt","r")

line=f.next()

while not line.startswith('PO'):

    line=f.next()

headerlist=line.strip().split()[1:]

linelist=[]

line=f.next().strip()

while not line.startswith('/'):

    if line != '':

        linelist.append(line.strip().split())

    line=f.next().strip()

keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]

array={}

linedict=dict(zip(keys,values))

keys = linedict.keys()

keys.sort()

for key in keys:

    array=[key,linedict[key]]

datadict={}

datadict1={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    print datadict[item]#this one returns empty dictionary

    for key_ in linedict:

        print item# all items are getting printed

        datadict[item][key_]=linedict[key_][i]

        # but for the print statement below its saying key error:'C'

       print datadict['C']['01']

I have written the problems in comment form in the code!

can you execute my code and see why this is happening? I have been stuck with this for a day now.
waiting for your reply
cheers!

I made a few minor changes to your code:

Expand|Select|Wrap|Line Numbers

 f=open("deeps1.txt","r")

line=f.next()

while not line.startswith('PO'):

    line=f.next()
 
headerlist=line.strip().split()[1:]

linelist=[]
 
line=f.next().strip()

while not line.startswith('/'):

    if line != '':

        linelist.append(line.strip().split())

    line=f.next().strip()
 
keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]

array={}

linedict=dict(zip(keys,values))

keys = linedict.keys()

keys.sort()

# initialize list object 'array'

array = []

for key in keys:

    # append each item to list object

    array.append([key,linedict[key]])
 
# initialize dictionary

datadict={}
 
for i,item in enumerate(headerlist):

    datadict[item]={}

    # print 'item' here, datadict[item] is empty at this point

    print item

    for key_ in linedict:

        datadict[item][key_]=linedict[key_][i]
 
# Change indentation level

print datadict['C']['01']

Jul 9 '07 #41

aboxylica

111

100+

sorry,
that doesnt seem to do anything.

Expand|Select|Wrap|Line Numbers

 
f=open("deeps1.txt","r")

line=f.next()

while not line.startswith('PO'):

    line=f.next()
 
headerlist=line.strip().split()[1:]

linelist=[]
 
line=f.next().strip()

while not line.startswith('/'):

    if line != '':

        linelist.append(line.strip().split())

    line=f.next().strip()
 
keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]

array={}

linedict=dict(zip(keys,values))

keys = linedict.keys()

keys.sort()

for key in keys:

    array=[key,linedict[key]]
 
datadict={}

datadict1={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    print datadict[item]

    for key_ in linedict:
 
        datadict[item][key_]=linedict[key_][i]

        print linedict[key][i]#this is what i added and it still gives the same answer

        print datadict['C']['01']

:( or should i add it somewhere else?
waiting for ur reply
cheers!

Jul 9 '07 #42

bvdet

2,851

Expert Mod 2GB

CODE TAGS! This code:

Expand|Select|Wrap|Line Numbers

 for i,item in enumerate(headerlist):

    datadict[item]={}

    print item

    for key_ in linedict:

        datadict[item][key_]=linedict[key_][i]

        print linedict[key_][i]
 
print datadict['C']['01']

give me this output:

Expand|Select|Wrap|Line Numbers

Jul 9 '07 #43

aboxylica

111

100+

thank you sir!that works for me too..but what exactly i want when i type
datadict['C']["01'] is 3.67(according to my file that is printed below).But it is printing the entire thing
this is what my file says:
NA bap
PO A C G T
01 0.00 3.67 0.00 0.00
02 0.00 0.00 3.67 0.00
03 0.00 0.00 0.00 3.67
04 0.00 3.67 0.00 0.00
05 3.67 0.00 0.00 0.00
06 3.46 0.00 0.22 0.00
07 0.00 0.00 3.67 0.00
08 0.00 0.00 0.00 3.67
09 0.00 0.00 0.00 3.67
10 0.00 3.67 0.00 0.00
11 3.67 0.00 0.00 0.00
12 3.67 0.00 0.00 0.00
13 0.00 0.00 3.67 0.00
14 0.00 0.00 0.00 3.67
15 0.00 0.00 3.67 0.00
16 0.00 3.67 0.00 0.00
//
//
so how do I change the code accordingly?

Expand|Select|Wrap|Line Numbers

 
f=open("deeps1.txt","r")

line=f.next()

while not line.startswith('PO'):

    line=f.next()
 
headerlist=line.strip().split()[1:]

linelist=[]
 
line=f.next().strip()

while not line.startswith('/'):

    if line != '':

        linelist.append(line.strip().split())

    line=f.next().strip()
 
keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]

array={}

linedict=dict(zip(keys,values))

keys = linedict.keys()

keys.sort()

for key in keys:

    array=[key,linedict[key]]
 
datadict={}

datadict1={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    print item

    for key_ in linedict:

        datadict[item][key_]=linedict[key_][i]

        print linedict[key_][i] 

print datadict['C']['01']

waiting,
cheers!

Jul 9 '07 #44

bvdet

2,851

Expert Mod 2GB

thank you sir!that works for me too..but what exactly i want when i type
datadict['C']["01'] is 3.67(according to my file that is printed below).But it is printing the entire thing
this is what my file says:
NA bap
PO A C G T
01 0.00 3.67 0.00 0.00
02 0.00 0.00 3.67 0.00
03 0.00 0.00 0.00 3.67
04 0.00 3.67 0.00 0.00
05 3.67 0.00 0.00 0.00
06 3.46 0.00 0.22 0.00
07 0.00 0.00 3.67 0.00
08 0.00 0.00 0.00 3.67
09 0.00 0.00 0.00 3.67
10 0.00 3.67 0.00 0.00
11 3.67 0.00 0.00 0.00
12 3.67 0.00 0.00 0.00
13 0.00 0.00 3.67 0.00
14 0.00 0.00 0.00 3.67
15 0.00 0.00 3.67 0.00
16 0.00 3.67 0.00 0.00
//
//
so how do I change the code accordingly?

Expand|Select|Wrap|Line Numbers

f=open("deeps1.txt","r")

line=f.next()

while not line.startswith('PO'):

    line=f.next()

headerlist=line.strip().split()[1:]

linelist=[]

line=f.next().strip()

while not line.startswith('/'):

    if line != '':

        linelist.append(line.strip().split())

    line=f.next().strip()

keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]

array={}

linedict=dict(zip(keys,values))

keys = linedict.keys()

keys.sort()

for key in keys:

    array=[key,linedict[key]]

datadict={}

datadict1={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    print item

    for key_ in linedict:

        datadict[item][key_]=linedict[key_][i]

        print linedict[key_][i]

print datadict['C']['01']

waiting,
cheers!

I tested your code exactly as you posted it:

Expand|Select|Wrap|Line Numbers

 >>> datadict['C']['01']

3.6699999999999999

>>>

There are other print statements in the code.

Jul 9 '07 #45

aboxylica

111

100+

Thanks for all the help,I wouldnt have come this far without the help of all u ppl.
there is a new problem

Expand|Select|Wrap|Line Numbers

 
f=open("deeps1.txt","r")

line=f.next()

while not line.startswith('PO'):

    line=f.next()
 
headerlist=line.strip().split()[1:]

linelist=[]
 
line=f.next().strip()

while not line.startswith('/'):

    if line != '':

        linelist.append(line.strip().split())

    line=f.next().strip()
 
keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]

array={}

linedict=dict(zip(keys,values))

keys = linedict.keys()

keys.sort()

for key in keys:

    array=[key,linedict[key]]
 
datadict={}

datadict1={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    for key_ in linedict:

        datadict[item][key_]=linedict[key_][i]
 
for keymain in datadict:

    for keysub in datadict[keymain]:

        datadict[keymain][keysub]+=1.0

#print datadict['T']['16']

seq="ATA"

res=1

for i in range(1,len(seq)):

    key=seq[i]

    for keymain in datadict:

        if keymain==key:

            print key,i

  #print datadict[key]

#print res

this is the code.as I already posted i want to find something like
A[01]*T[02]*A[03]
But the problem I am facing is that the datadict has keys like "01","02" but the in the loop of seq i have 1,2,3,4. and i cant start from zero whatsoever. how can i make the looping of my seq to 01,02 etc..if i say
for i in range('01',len(seq)):
its taking it as a string!
waiting for ur reply,
cheers!!

Jul 10 '07 #46

aboxylica

111

100+

oops am sorry!
The one you suggested is working:
this is the code:

Expand|Select|Wrap|Line Numbers

 
f=open("deeps1.txt","r")

line=f.next()

while not line.startswith('PO'):

    line=f.next()
 
headerlist=line.strip().split()[1:]

linelist=[]
 
line=f.next().strip()

while not line.startswith('/'):

    if line != '':

        linelist.append(line.strip().split())

    line=f.next().strip()
 
keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]

array={}

linedict=dict(zip(keys,values))

keys = linedict.keys()

keys.sort()

for key in keys:

    array=[key,linedict[key]]
 
datadict={}

datadict1={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    for key_ in linedict:

        datadict[item][key_]=linedict[key_][i]
 
for keymain in datadict:

    for keysub in datadict[keymain]:

        datadict[keymain][keysub]+=1.0

#print datadict['T']['16']

seq="ATATT"

res=1

for i,key in enumerate (seq):

    res*=datadict[key]["%02d"%(i+1)]# I dont understand this line.the formatting especially

print res

This seems to do for the first letter only "A"
waiting for your reply,
cheers!

Jul 10 '07 #47

aboxylica

111

100+

sorry!!
its working:)
cheers!!

Jul 10 '07 #48

aboxylica

111

100+

I am doing a simple task here,and I am getting error.cant understand why that is happening!
I am trying to find out a score for a sequence after creating all the dictionaries:
if the seq="ACGT"
value of A and T is 0.3
VALUE of C and G is 0.2
score=val(A)*val(c)*val(G)*val(T)
so it should be score=0.3*0.2*0.2*0.3=3.6
My error is mentioned in comment form in the last lines of the code
this is my code:

Expand|Select|Wrap|Line Numbers

 
f=open("deeps1.txt","r")

line=f.next()

while not line.startswith('PO'):

    line=f.next()
 
headerlist=line.strip().split()[1:]

linelist=[]
 
line=f.next().strip()

while not line.startswith('/'):

    if line != '':

        linelist.append(line.strip().split())

    line=f.next().strip()
 
keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]

array={}

linedict=dict(zip(keys,values))

keys = linedict.keys()

keys.sort()

for key in keys:

    array=[key,linedict[key]]
 
datadict={}

datadict1={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    for key_ in linedict:

        datadict[item][key_]=linedict[key_][i]
 
for keymain in datadict:

    for keysub in datadict[keymain]:

        datadict[keymain][keysub]+=1.0

#print datadict['T']['16']

seq="CGTCAG"
 
res=1

for i in range(0,len(seq)):

    key=seq[i]

    res*=datadict[key]["%02d"%(i+1)]

    #print res

    score=1

    value={"A":"0.3","T":"0.3","C":"0.2","G":"0.2"}

    for it in value:

        for item in seq:

            if it==key:

                score=score*value[it]

                print score# I get an error that says TypeError: can't multiply sequence by non-int of type 'str'
 
print res

waiting 4 ur reply
cheers!

Jul 10 '07 #49

bvdet

2,851

Expert Mod 2GB

I am doing a simple task here,and I am getting error.cant understand why that is happening!
I am trying to find out a score for a sequence after creating all the dictionaries:
if the seq="ACGT"
value of A and T is 0.3
VALUE of C and G is 0.2
score=val(A)*val(c)*val(G)*val(T)
so it should be score=0.3*0.2*0.2*0.3=3.6
My error is mentioned in comment form in the last lines of the code
this is my code:

Expand|Select|Wrap|Line Numbers

f=open("deeps1.txt","r")

line=f.next()

while not line.startswith('PO'):

    line=f.next()

headerlist=line.strip().split()[1:]

linelist=[]

line=f.next().strip()

while not line.startswith('/'):

    if line != '':

        linelist.append(line.strip().split())

    line=f.next().strip()

keys=[i[0] for i in linelist]

values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]

array={}

linedict=dict(zip(keys,values))

keys = linedict.keys()

keys.sort()

for key in keys:

    array=[key,linedict[key]]

datadict={}

datadict1={}

for i,item in enumerate(headerlist):

    datadict[item]={}

    for key_ in linedict:

        datadict[item][key_]=linedict[key_][i]

for keymain in datadict:

    for keysub in datadict[keymain]:

        datadict[keymain][keysub]+=1.0

#print datadict['T']['16']

seq="CGTCAG"

res=1

for i in range(0,len(seq)):

    key=seq[i]

    res*=datadict[key]["%02d"%(i+1)]

    #print res

    score=1

    value={"A":"0.3","T":"0.3","C":"0.2","G":"0.2"}

    for it in value:

        for item in seq:

            if it==key:

                score=score*value[it]

                print score# I get an error that says TypeError: can't multiply sequence by non-int of type 'str'

print res

waiting 4 ur reply
cheers!

You are receiving the error because the values in dictionary 'value' are strings. Either define them as numbers (e.g. "A":0.3,"T":0.3,....) or convert to float in the calculation:

Expand|Select|Wrap|Line Numbers

score=score*float(value[it])

Jul 10 '07 #50

how to access the individual elements of a matrix in python

Similar topics