Some members here are aware that Brian M. and I have been obtaining fig DNA marker data from a lab in Sacramento. Brian’s interest is in finding synonyms among fig cultivars whereas I’ve been capturing them to see if they also occur in whole genome sequencing of figs I have underway at Arizona Genomics Institute.
Brian has found that the Black Madeira and Figo Preto have identical “fingerprints” according to the lab markers.
I was curious about this and sent samples of them to the lab along with several others that are being sequenced at AGI. Here is the raw marker data I received in return: 251899 Fig.xlsx (12.8 KB)
I then processed the data into a more standard form, showing the counts returned (or lack thereof) for each marker. In the table, the F and R refer to the two parts of each marker which are listed in the file above. I’ve put Black Madeira and Figo Preto in the first two data columns so you can see they really are identical in this marker set:
Black Madeira
Figo Preto
Bardacik
Burgan Unknown
Filacciano Bianco
Janice Kadota Seedless
Martinenca Rimada
Nuestra Senora del Carmen
Unknown Pastiliere
Violette de Bordeaux
Frub398 F
189
189
189
187,189
187
187
189
187
187,189
187,189
Frub398 R
191
191
191
191,193
193
193
191
193
191,193
191
V230 F
109
109
102
109
109
109
102
102
109
V230 R
115
115
115
115
115
115
Frub391 F
173
173
169
165
171
173
165
177
169
165
Frub391 R
175
175
173
171
173
186
173
186
186
171
Frac241 F
284
284
282,284
284
284
284
284
284
284
Frac241 R
288
288
288
V149 F
215
215
207
215
207
215
215
215
200
207
V149 R
219
219
219
219
219
219
219
215
215
FCUP008 F
154
154
152
154
154
154
154
152
FCUP008 R
171
171
169,171
173
173,177
171
171
171
171
FCUP027 F
196
196
192,196
196
192,196
192
FCUP027 R
208
208
202,208
208
208
208
208
208
FCUP016 F
162
162
162
162
162
162
162
162
162
166
FCUP016 R
168
168
168
168
168
FCUP038 F
160
160
158
166
160,166
160
158,166
166
158
FCUP038 R
183
183
173,179
181
183
183
183
172,175
170
FCUP069 F
183,185
183,185
183
183
183,185
183,185
183
175
FCUP069 R
191,193
193
193
191,193
193
191,193
FCUP044 F
199,201
199,201
201
197,201
197
197,199
199
200
197,201
FCUP044 R
208
208
208
208
Frac13 F
123
123
123
123
123
123
123
123
Frac13 R
138
138
138
138
138
136
FCUP042 F
169
169
169
169
169
169
169
169
167,169
169
FCUP042 R
175
FCUP045 F
121
121
121
121
121
121
121
119,121
121
FCUP045 R
131
131
131
131
131
131
131
FCUP070 F
165
167,169
FCUP070 R
171,175
171,175
183
171
171,173
171,173
175,179
173,175
173
179
The markers in use are not ideal but are good for ferreting possible synonyms. Beyond that though they have little utility. For example, if you want to get an idea of the relationships between these figs then further vetting of the markers is needed to eliminate those with missing values. Here’s what you get:
Bardacik
Black Madeira
Burgan Unknown
Filacciano Bianco
Janice Kadota Seedless
Martinenca Rimada
Nuestra Senora del Carmen
Unknown Pastiliere
Violette de Bordeaux
Frub398 F
{189}
{189}
{187,189}
{187}
{187}
{189}
{187}
{187,189}
{187,189}
Frub398 R
{191}
{191}
{191,193}
{193}
{193}
{191}
{193}
{191,193}
{191}
Frub391 F
{169}
{173}
{165}
{171}
{173}
{165}
{177}
{169}
{165}
Frub391 R
{173}
{175}
{171}
{173}
{186}
{173}
{186}
{186}
{171}
V149 F
{207}
{215}
{215}
{207}
{215}
{215}
{215}
{200}
{207}
FCUP016 F
{162}
{162}
{162}
{162}
{162}
{162}
{162}
{162}
{166}
FCUP042 F
{169}
{169}
{169}
{169}
{169}
{169}
{169}
{167,169}
{169}
FCUP070 R
{183}
{171,175}
{171}
{171,173}
{171,173}
{175,179}
{173,175}
{173}
{179}
From that a diagram of nearest neighbor relations can be made, which shows the number of marker mismatches between each fig. Notice the unlikely close relations to Janice-Kadota Seedless.
I am impressed by the tremendous effort that you are putting into this, and am excited to watch the results continue to unravel the confusing story of how different fig cultivars are related to each other. Keep up the good work @Richard!
It’s so wild that these figs can exhibit such differences in ripening times, and sometimes even flavor, despite having the same DNA. It’s incredible how much of an effect epigenetics has.
I’m very much looking forward to more revelations as far as synonyms and closely related cousins
They do not have the same DNA. Instead, 11 markers returned the same inflorescence values (including blank) when annealed with the chromosomes of the specimens.
This is very cool to see, there’s so much left to learn by applying even classic molecular biology methods to commonly cultivated fruit! It’s been cost-prohibitive to do whole genome sequencing until recently, and it’s still a bit too pricey for the hobby market- we’re talking roughly 1k USD per plant genome, and then you still need some expertise to generate a good genome assembly.
Out of curiosity, can you share more details about the markers being tested? It’s obviously some kind of PCR or hybridization method given the primer sequences you shared. But what are the marker values exactly? Are these PCR amplicon sizes?
You cannot get a reasonably accurate perennial plant genome for US$ 1k, or even 5k.
They are among a standard set of markers used in kits sold to university labs and also by manufacturers of automated SSR hardware, e.g. Illumina. If I were to search for the origin I’d try the BMC journal dedicated to markers. They were selected by the lab for Brian’s project, and I stuck with them for consistency. Note that Brian and I have very different goals – see OP.
You cannot get a reasonably accurate perennial plant genome for US$ 1k, or even 5k.
You absolutely can, but only at scale
Not as a single sample, and not direct to consumer. And to be clear I’m just talking sequencing cost here (though sample prep doesn’t really add that much extra cost, maybe another $100). But you might be surprised by how much prices have come down in the last few years. Just this year the prices dropped nearly 3x once again for both Illumina and PacBio. Illumina is cheaper of course but you won’t get any decent assembly from that alone. But long read sequencing has come a long way and you can get some darn good assemblies with just PacBio. If you’re going for something complicated like a hexaploid you might need to throw in some additional HiC to get homologous chromosomes assembled.
But you don’t have to take my word for it!
U.S. list price is $995 for sequencing reagents for one Revio SMRT Cell, which has an expected yield of 90 Gb, equivalent to a 30X human genome. (source)
Note that quoted price is for a human genome at about 3 gigabases haploid, which is a few times larger than many plant genomes (though some of them do get crazy big like redwood at >26 gigabases!).
Yeah there will always be outliers, and to estimate the cost you’d need to know roughly the size of the genome (there are methods to do this without sequencing just based on the mass of the chromosomes themselves).
But it’s exciting to think how much we’ll be able to learn by widespread sequencing of just the handful of the most common commercial fruit trees like apples and stonefruit. All of those are on the order of 5-10x smaller than a human genome (ok there’s triploid apples those are going to take a bit more). And the tech has advanced to the point where this is done routinely for de novo genome assembly (i.e. no reliance at all on mapping to anything previously sequenced).
And to the topic at hand - figs genomes are only a bit over 3 megabases! It’s feasible to do those for hundreds of dollars a piece using current tech! After you buy a million dollar sequencer and plop it down in your 10 million dollar lab
That’s why I think your Fig project is very cool, it’s a great first look into the nearly unexplored world of “heirloom” plant genomics!
Then provide me with a quote from your lab for a high quality Ficus carica reference genome including all costs (sample preparation, PacBio Sequel sequencing, HQ analysis, resequencing, etc. through library construction).
I really admire this work and will follow if there will be further studies.
Pardon my ignorance on this field of DNA marker but trying to grasp the data you shared, I have some questions that I would greatly appreciate if answered.
I’m guessing the DNA marker data in the sheet is in kb unit - showing the location of electrophoreisis results, allowing to create genetic distance via Jaccard coefficient?
In my knowledge, to expand a existing distance relationship diagram you need raw marker data(electrophoresis data) from the previous study and use the same marker(primer).
If I am aiming to expand the work of Ikegami(2008), Do you personally think Ikegami would provide such data to random person like myself? I have posted this question on ourfigs but came with less clear answers.
Is there a specific reason why you chose to make a distance relationship diagram that is not a dendrogram? (Is it because you view Aradhya’s 2010 approach to be flawful?)