UnfoldCDL

Documentation for UnfoldCDL.

UnfoldCDL.find_motifMethod

This function will perform motif discovery on the specified fasta file, inputfasta. The motif discovery result will be stored in the folder outputfolder.

Input: inputfasta: a string that specifies the path of the fasta file Output: outputfolder: a string that specifies the path of the folder that will contain the motif discovery result on inputfasta. - It will automatically create outputfolder if outputfolder does not exist. example: findmotif("/home/shanechu/Desktop/example.fa", "/home/shanechu/Desktop/")

If you wan to find motif co-occurrence (experimental; gives only a conservative estimate), 
set co_occurrence_results=true, i.e. execute
    find_motif("/home/shane_chu/Desktop/example.fa", "/home/shane_chu/Desktop/"; co_occurrence_results=true)
source
UnfoldCDL.find_motif_fasta_folderMethod

This function will

  1. Search for all the files in the folder fasta_folder with file extension .fa.
  2. Perform a motif discovery on each of those files.
  3. Output the motif discovery results to the folder output_folder.
    • It will automatically create outputfolder if outputfolder does not exist.

Input: fasta_folder: a string that specifies the path of a folder that contains fasta files for motif discovery.

Output: output_folder: a string that specifies the path of a folder to contain to motif discovery results.

Example: findmotif("/home/shanechu/Desktop/fastafolder", "/home/shanechu/Desktop/fasta_folder/results")

If you wan to find motif co-occurrence (experimental; gives only a conservative estimate), 
set co_occurrence_results=true, i.e. execute
find_motif("/home/shane_chu/Desktop/fasta_folder", "/home/shane_chu/Desktop/fasta_folder/results"; co_occurrence_results=true)
source
UnfoldCDL.pvalue2scoreMethod
pval2score(pwm, pval, ϵ=1e-1, k=10, bg=[.25,.25,.25,.25])

Returns the highest score(M,pval) of a pwm such that p-value is greater or equal to pval.

Input:

  • pwm: a 4 x m matrix
  • pval: a p-value; e.g. pval = 1e-3
  • ϵ: initial granularity (optional)
  • k: Refinement parameter (optional)
  • bg: multinomial background (optional)

Output

  • alpha: the highest score-threshold
source
UnfoldCDL.str2freqMethod

data: (to be added) pos: positions and the sequence; a vector of [(start₁:end₁, seqnum₁), (start₂, end₂, seqnum₂),...] ps: pseudocount

return a position frequency matrix, i.e., 1 2 3 4 A 0.1 0.1 ....... C 0.2 0.1 ....... G 0.3 0.1 ....... T 0.4 0.7 .......

source