UnfoldCDL
Documentation for UnfoldCDL.
UnfoldCDL.find_motif
UnfoldCDL.find_motif_fasta_folder
UnfoldCDL.obtain_count_matrices
UnfoldCDL.pvalue2score
UnfoldCDL.str2freq
UnfoldCDL.find_motif
— MethodThis function will perform motif discovery on the specified fasta file, inputfasta. The motif discovery result will be stored in the folder outputfolder.
Input: inputfasta: a string that specifies the path of the fasta file Output: outputfolder: a string that specifies the path of the folder that will contain the motif discovery result on inputfasta. - It will automatically create outputfolder if outputfolder does not exist. example: findmotif("/home/shanechu/Desktop/example.fa", "/home/shanechu/Desktop/")
If you wan to find motif co-occurrence (experimental; gives only a conservative estimate),
set co_occurrence_results=true, i.e. execute
find_motif("/home/shane_chu/Desktop/example.fa", "/home/shane_chu/Desktop/"; co_occurrence_results=true)
UnfoldCDL.find_motif_fasta_folder
— MethodThis function will
- Search for all the files in the folder fasta_folder with file extension .fa.
- Perform a motif discovery on each of those files.
- Output the motif discovery results to the folder output_folder.
- It will automatically create outputfolder if outputfolder does not exist.
Input: fasta_folder: a string that specifies the path of a folder that contains fasta files for motif discovery.
Output: output_folder: a string that specifies the path of a folder to contain to motif discovery results.
Example: findmotif("/home/shanechu/Desktop/fastafolder", "/home/shanechu/Desktop/fasta_folder/results")
If you wan to find motif co-occurrence (experimental; gives only a conservative estimate),
set co_occurrence_results=true, i.e. execute
find_motif("/home/shane_chu/Desktop/fasta_folder", "/home/shane_chu/Desktop/fasta_folder/results"; co_occurrence_results=true)
UnfoldCDL.obtain_count_matrices
— MethodZ: the sparse code returned from the training
data: dataset
flen: the length of the filters (they are all of the same length)
UnfoldCDL.pvalue2score
— Methodpval2score(pwm, pval, ϵ=1e-1, k=10, bg=[.25,.25,.25,.25])
Returns the highest score(M,pval) of a pwm
such that p-value is greater or equal to pval
.
Input:
pwm
: a 4 x m matrixpval
: a p-value; e.g. pval = 1e-3ϵ
: initial granularity (optional)k
: Refinement parameter (optional)bg
: multinomial background (optional)
Output
alpha
: the highest score-threshold
UnfoldCDL.str2freq
— Methoddata: (to be added) pos: positions and the sequence; a vector of [(start₁:end₁, seqnum₁), (start₂, end₂, seqnum₂),...] ps: pseudocount
return a position frequency matrix, i.e., 1 2 3 4 A 0.1 0.1 ....... C 0.2 0.1 ....... G 0.3 0.1 ....... T 0.4 0.7 .......